{"cells":[{"cell_type":"code","metadata":{"tags":[],"cell_id":"982ee53d-d8ac-4f15-8bed-8a5cf6fd1308"},"source":"import pandas as pd\r\nimport numpy as np\r\nimport matplotlib.pyplot as plt\r\nimport seaborn as sns\r\nimport nltk\r\nfrom nltk import FreqDist \r\nfrom nltk.tokenize import word_tokenize\r\n%matplotlib inline","outputs":[]},{"cell_type":"markdown","source":"## Importing Data","metadata":{"tags":[],"cell_id":"1d0ffd10-1930-4a2c-8814-fc9c1cbbaeec"}},{"cell_type":"markdown","source":"The first step of the data exploration is to understand the data at a high level. The easiest way to do that it by looking at the data head","metadata":{"tags":[],"cell_id":"23962766-9003-40d8-aa26-2b57d3143613"}},{"cell_type":"code","metadata":{"tags":[],"cell_id":"24872ea4-b62c-4b5f-acfe-65b116c4ba16"},"source":"df = pd.read_csv(\"../Data/train.csv\")\r\ndf.head()","outputs":[{"output_type":"execute_result","execution_count":6,"data":{"application/vnd.deepnote.dataframe+json":{"variableDetails":{"dataframe":{"0":{"textID":"cb774db0d1","text":" I`d have responded, if I were going","selected_text":"I`d have responded, if I were going","sentiment":"neutral"},"1":{"textID":"549e992a42","text":" Sooo SAD I will miss you here in San Diego!!!","selected_text":"Sooo SAD","sentiment":"negative"},"2":{"textID":"088c60f138","text":"my boss is bullying me...","selected_text":"bullying me","sentiment":"negative"},"3":{"textID":"9642c003ef","text":" what interview! leave me alone","selected_text":"leave me alone","sentiment":"negative"},"4":{"textID":"358bd9e861","text":" Sons of ****, why couldn`t they put them on the releases we already bought","selected_text":"Sons of ****,","sentiment":"negative"}},"columns":[{"name":"textID","stats":{"count":5,"unique":5,"top":"358bd9e861","freq":1,"nan_count":0}},{"name":"text","stats":{"count":5,"unique":5,"top":" Sons of ****, why couldn`t they put them on the releases we already bought","freq":1,"nan_count":0}},{"name":"selected_text","stats":{"count":5,"unique":5,"top":"Sons of ****,","freq":1,"nan_count":0}},{"name":"sentiment","stats":{"count":5,"unique":2,"top":"negative","freq":4,"nan_count":0}}],"frequencyInfo":[{"frequencyData":[{"name":"cb774db0d1","frequency":0.2},{"name":"549e992a42","frequency":0.2},{"name":"3 others","frequency":0.6}],"type":"freq"},{"frequencyData":[{"name":" I`d have responded, if I were going","frequency":0.2},{"name":" Sooo SAD I will miss you here in San Diego!!!","frequency":0.2},{"name":"3 others","frequency":0.6}],"type":"freq"},{"frequencyData":[{"name":"I`d have responded, if I were going","frequency":0.2},{"name":"Sooo SAD","frequency":0.2},{"name":"3 others","frequency":0.6}],"type":"freq"},{"frequencyData":[{"name":"negative","frequency":0.8},{"name":"neutral","frequency":0.2}],"type":"freq"}]},"numElements":5,"numColumns":4},"text/plain":"       textID                                               text  \\\n0  cb774db0d1                I`d have responded, if I were going   \n1  549e992a42      Sooo SAD I will miss you here in San Diego!!!   \n2  088c60f138                          my boss is bullying me...   \n3  9642c003ef                     what interview! leave me alone   \n4  358bd9e861   Sons of ****, why couldn`t they put them on t...   \n\n                         selected_text sentiment  \n0  I`d have responded, if I were going   neutral  \n1                             Sooo SAD  negative  \n2                          bullying me  negative  \n3                       leave me alone  negative  \n4                        Sons of ****,  negative  ","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>textID</th>\n      <th>text</th>\n      <th>selected_text</th>\n      <th>sentiment</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>cb774db0d1</td>\n      <td>I`d have responded, if I were going</td>\n      <td>I`d have responded, if I were going</td>\n      <td>neutral</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>549e992a42</td>\n      <td>Sooo SAD I will miss you here in San Diego!!!</td>\n      <td>Sooo SAD</td>\n      <td>negative</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>088c60f138</td>\n      <td>my boss is bullying me...</td>\n      <td>bullying me</td>\n      <td>negative</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>9642c003ef</td>\n      <td>what interview! leave me alone</td>\n      <td>leave me alone</td>\n      <td>negative</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>358bd9e861</td>\n      <td>Sons of ****, why couldn`t they put them on t...</td>\n      <td>Sons of ****,</td>\n      <td>negative</td>\n    </tr>\n  </tbody>\n</table>\n</div>"},"metadata":{}}]},{"cell_type":"code","metadata":{"tags":[],"cell_id":"024f33f0-c33a-4bbc-a314-89e0291599b6"},"source":"df.shape","outputs":[{"output_type":"execute_result","execution_count":7,"data":{"text/plain":"(27481, 4)"},"metadata":{}}]},{"cell_type":"markdown","source":"The training data consists of 27481 samples with 4 columns each\r\n\r\n1. A unique identifier\r\n2. The tweet text\r\n3. The text within the tweet contributing to the sentiment\r\n4. The sentiment","metadata":{"tags":[],"cell_id":"067ad34b-3815-454e-bc61-feca2e26c26f"}},{"cell_type":"code","metadata":{"tags":[],"cell_id":"0e79ada7-4551-4f50-99ed-10ef9a6580e3"},"source":"df.sentiment.value_counts()","outputs":[{"output_type":"execute_result","execution_count":8,"data":{"text/plain":"neutral     11118\npositive     8582\nnegative     7781\nName: sentiment, dtype: int64"},"metadata":{}}]},{"cell_type":"markdown","source":"The data is not split evenly as 40.5% contains neutral sentiment, 31.2% is positive and 28.3% is negative. The imbalance is not too terrible however","metadata":{"tags":[],"cell_id":"19b139e0-993f-4766-bf7c-ed110376cce5"}},{"cell_type":"code","metadata":{"tags":[],"cell_id":"0be7de37-4e8c-41a7-a36a-543ec9aa78ed"},"source":"df.sentiment.value_counts().plot(kind=\"bar\")\r\nplt.title(\"Relative Frequency of Sentiment\")\r\nplt.xlabel(\"Sentiment\")\r\nplt.ylabel(\"Occurences\")\r\nplt.xticks(rotation=25)\r\nsns.set(style=\"darkgrid\")","outputs":[{"data":{"text/plain":"<Figure size 432x288 with 1 Axes>","image/png":"iVBORw0KGgoAAAANSUhEUgAAAZEAAAEoCAYAAACZ5MzqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3debxVZb3H8c8XEEWcUE6mgGI5oqUiiUOZiaE5YaWpZeKQZlqZdVO00nJKy+t4yyE1tRxTSyxNuQ7XMidwBic0DcgBBcTZkN/943m2Lo/nwGZx9t5ns7/v1+u8zlrPmp611zn7t55hPUsRgZmZWRk9Gp0BMzNrXg4iZmZWmoOImZmV5iBiZmalOYiYmVlpDiJmZlaag4gtFElbSpq6ENufLeknXZknq56kFSXdLulVSf/d4Lx8TdJNjcyDLTgHEUPSM5LelPSapOclXShpqRocZ29Jfy+mRcSBEXFsDY5VPKfKz8pdfZxFwAHAS8AyEfGD9gslDZR0taSXJL0i6RFJey/sQSUNlhSSelXSIuKSiBi5sPsukZeFuhFqdQ4iVrFjRCwFbABsCBzR4Px0hR0jYqnCz7/br1D8EmtRqwKTovOnjn8HTMnrrQB8HXihTnmzJuAgYh8QEc8DN5KCCQCSFpd0sqR/SXohV0H16Wh7SWMkPZWrRyZJ+mJOXwc4G9g0lwpm5fQLJR2Xpx+VtENhX70kTZc0NM9vIukfkmZJelDSlgt6foU74P0k/Qu4ZX77lrSapP/L5zRO0v9I+n1e9qG72FwK2jpP9yh8Ji9LulLS8u3yMjp/ti9J+lFhPz0lHVn4PCdIGiTpV+2rniSNlXRoJ+e8maR7c0niXkmbVT57YDRwWL4mW3ew+aeACyPi9YiYExH3R8QNhX3P63O7TdKxku7I+b9JUv+8+Pb8e1Y+9qbtS6r5szlI0pN5+2MlfTwfb3b+LHsX1t9B0gM5L/+Q9Ml21+S/JD2UP4crJC0hqS9wA7CyXGItJyL80+I/wDPA1nl6IPAwcHph+anAWGB5YGngOuDnedmWwNTCursCK5NuUHYDXgdWysv2Bv7e7tgXAsfl6aOASwrLtgcezdMDgJeB7fK+P5/n2+Z3Tu3SBwMBXAz0BfrMb9/AncApwOLAFsCrwO87Ov8OPs9DgLvy57o4cA5wWbu8/CbnY33gbWCdvPyH+VqsBSgvXwHYGPg30COv1x94A1ixg/NdHphJKkH0AvbI8yu0//w7+Rz/F7gD2B1Ypd2y+X1utwFPAWvm87sNOLHdufcq7O8Dfx95+bXAMsC6+bO5GfgYsCwwCRid190QeBEYDvQkBcdngMUL1+Qe0t/m8sCjwIGdXUP/LMD3R6Mz4J/G/+R/sNfyl2Pkf9Tl8jKRAsHHC+tvCvwzT8/zHxB4ABiVpz/wJZHT3vsSA1bPeVgyz18CHJWnDwd+127bGytfIvM4p1n55085vfLl9bHCup3uG1gFmAP0LSy7lOqDyKPAiMKylYD/kL7QK3kZWFh+D7B7nn688tl1cH6PAp/P098Gru9kva8D97RLuxPYu/3n38n2/YATgYnAu/l6fqqaa0IKGj8uLDsI+Gu76zC/ILJ5YX4CcHhh/r+B0/L0WcCx7fLyOPDZwjXZs7DsF8DZ1fwN+2feP67OsoqdI2Jp0j/U2qS7W4A2YElgQq4mmAX8Nad/iKS9ClUKs4D1Cvuap4iYTPpy3FHSksBOpC9sSHXyu1b2m/f9adKX8rzOabn8s3O7ZVMK0/Pa98rAzIh4vbD+s9WcT2Hffyzs91HSl/GKhXWeL0y/AVQ6NQwi3cl35CJgzzy9J6ntoiMrd5DfZ0mliPmKiJkRMSYi1s15fgD4kyRR3TXp7NyqVWx/ebOD+cr+VgV+0C4vg0jn31V5sQ60eqOitRMR/5fryk8Gdib13HkTWDcips1rW0mrkqpmRgB3RsS7kh4glWYg3VnOz2WkKpcepAbfyTl9Cumud/8FPKXOFPPS6b7zOfWT1LcQSFYpbP86KchW1u/JBwPsFGDfiLijg30Pnk8epwAfBx7pYNnvgUckrQ+sA/ypk338m/QFW7QK6UZggUTES5JOJpXQlmfhrklXDx8+BTg+Io7vBnlpKS6JWEdOAz4vaf2ImEsKDKdK+giApAGStulgu76kf8jpeb19SCWRiheAgcXG0A5cDowEvsX7pRBIX5o7StomNzgvkRu1B5Y8x6JO9x0RzwLjgZ9J6i3p08COhW2fAJaQtL2kxYAfk9o+Ks4Gjs/BCEltkkZVma/zgGMlraHkk5JWAIiIqcC9pBLI1RHxZif7uB5YU9JXlToq7AYMAf5cTQYknSRpvbzt0qTrMjkiXmbhrsl0YC6pfaMr/AY4UNLw/Fn1zddk6Sq2fQFYQdKyXZSXluIgYh8SEdNJDc9H5aTDgcnAXZJmkxpb1+pgu0mkeuo7Sf+YnyA1ylbcQqpbf17SS50c+7m8/WbAFYX0KcAo4EjSF9AUUsPzQv8NV7Hvr5IabGcAR5M+m8q2r5Dq+s8DppFKJsXeWqeTOiXcJOlVUiP78CqzdgpwJXATMBs4n9RAXXER6TPurCqL/GW/A/ADUqP3YcAOEdHh59+BJYE/ktqVniaVanbK+y59TSLiDeB44I5c/bRJlfnpbH/jgf2B/yF1HJhMamOpZtvHSCXgp3Ne3DtrASg3LJlZlST9FFg9Ivac37o1zscWpNLAquF/ZGsQl0TMmlCuOjsEOM8BxBrJQcSsySg9uDmL1AvqtAZnx1qcq7PMzKw0l0TMzKw0BxEzMyut5R427N+/fwwePLjR2TAzaxoTJkx4KSI6HKWi5YLI4MGDGT9+fKOzYWbWNCR1OtSPq7PMzKw0BxEzMyvNQcTMzEpzEDEzs9IcRMzMrDQHETMzK81BxMzMSnMQMTOz0lruYcN6GjzmL43OQk09c+L2jc6CmTWYSyJmZlaag4iZmZXmIGJmZqU5iJiZWWkOImZmVpqDiJmZleYgYmZmpTmImJlZaQ4iZmZWmoOImZmV5iBiZmalOYiYmVlpDiJmZlaag4iZmZXmIGJmZqU5iJiZWWk1CyKSLpD0oqRHCmnLSxon6cn8u19Ol6QzJE2W9JCkoYVtRuf1n5Q0upC+kaSH8zZnSFKtzsXMzDpWy5LIhcC27dLGADdHxBrAzXke4AvAGvnnAOAsSEEHOBoYDmwMHF0JPHmd/QvbtT+WmZnVWM2CSETcDsxolzwKuChPXwTsXEi/OJK7gOUkrQRsA4yLiBkRMRMYB2ybly0TEXdFRAAXF/ZlZmZ1Uu82kRUj4rk8/TywYp4eAEwprDc1p80rfWoH6WZmVkcNa1jPJYiox7EkHSBpvKTx06dPr8chzcxaQr2DyAu5Kor8+8WcPg0YVFhvYE6bV/rADtI7FBHnRsSwiBjW1ta20CdhZmZJvYPIWKDSw2o0cG0hfa/cS2sT4JVc7XUjMFJSv9ygPhK4MS+bLWmT3Ctrr8K+zMysTnrVaseSLgO2BPpLmkrqZXUicKWk/YBnga/k1a8HtgMmA28A+wBExAxJxwL35vWOiYhKY/1BpB5gfYAb8o+ZmdVRzYJIROzRyaIRHawbwMGd7OcC4IIO0scD6y1MHs3MbOH4iXUzMyvNQcTMzEpzEDEzs9IcRMzMrDQHETMzK81BxMzMSnMQMTOz0hxEzMysNAcRMzMrzUHEzMxKcxAxM7PSHETMzKw0BxEzMyvNQcTMzEpzEDEzs9IcRMzMrDQHETMzK81BxMzMSnMQMTOz0hxEzMysNAcRMzMrrVejM2DWXQ0e85dGZ6Gmnjlx+0ZnwRYBLomYmVlpDiJmZlaag4iZmZXmIGJmZqU5iJiZWWkOImZmVpqDiJmZldaQICLpUEkTJT0i6TJJS0haTdLdkiZLukJS77zu4nl+cl4+uLCfI3L645K2acS5mJm1sroHEUkDgO8CwyJiPaAnsDtwEnBqRKwOzAT2y5vsB8zM6afm9ZA0JG+3LrAt8GtJPet5LmZmra5R1Vm9gD6SegFLAs8BWwFX5eUXATvn6VF5nrx8hCTl9Msj4u2I+CcwGdi4Tvk3MzMaEEQiYhpwMvAvUvB4BZgAzIqIOXm1qcCAPD0AmJK3nZPXX6GY3sE2HyDpAEnjJY2fPn16156QmVkLa0R1Vj9SKWI1YGWgL6k6qmYi4tyIGBYRw9ra2mp5KDOzltKI6qytgX9GxPSI+A9wDbA5sFyu3gIYCEzL09OAQQB5+bLAy8X0DrYxM7M6aEQQ+RewiaQlc9vGCGAScCuwS15nNHBtnh6b58nLb4mIyOm7595bqwFrAPfU6RzMzIwGDAUfEXdLugq4D5gD3A+cC/wFuFzScTnt/LzJ+cDvJE0GZpB6ZBEREyVdSQpAc4CDI+Ldup6MmXVbi/JQ/t1pGP+GvE8kIo4Gjm6X/DQd9K6KiLeAXTvZz/HA8V2eQTMzq4qfWDczs9IcRMzMrDQHETMzK81BxMzMSnMQMTOz0hxEzMysNAcRMzMrbYGDiKQekpapRWbMzKy5VBVEJF0qaRlJfYFHgEmSfljbrJmZWXdXbUlkSETMJr3j4wbSCLxfr1muzMysKVQbRBaTtBgpiIzNo+9G7bJlZmbNoNogcg7wDOndH7dLWhWYXatMmZlZc6hqAMaIOAM4o5D0rKTP1SZLZmbWLKptWF9R0vmSbsjzQ3j/HR9mZtaiqq3OuhC4kfQ6W4AngO/VIkNmZtY8qg0i/SPiSmAuQETMAfwCKDOzFldtEHld0grkHlmSNgFeqVmuzMysKVT7ZsPvk95p/nFJdwBtvP8+dDMza1HV9s66T9JngbUAAY/nZ0XMzKyFVds762BgqYiYGBGPAEtJOqi2WTMzs+6u2jaR/SNiVmUmImYC+9cmS2Zm1iyqDSI9JakyI6kn0Ls2WTIzs2ZRbcP6X4ErJJ2T57+Z08zMrIVVG0QOJwWOb+X5ccB5NcmRmZk1jWp7Z80Fzso/ZmZmQJVBRNLmwE+BVfM2AiIiPla7rJmZWXdXbXXW+cChwAQ83ImZmWXVBpFXIuKGmubEzMyaTrVdfG+V9EtJm0oaWvkpe1BJy0m6StJjkh7N+11e0jhJT+bf/fK6knSGpMmSHioeV9LovP6Tkjw0vZlZnVVbEhmefw8rpAWwVcnjng78NSJ2kdQbWBI4Erg5Ik6UNAYYQ+oV9gVgjfwznNS4P1zS8sDROU8BTJA0Nj8IaWZmdVBt76wue4uhpGWBLYC9877fAd6RNArYMq92EXAbKYiMAi6OiADuyqWYlfK64yJiRt7vOGBb4LKuyquZmc1b6TcbStqv5DFXA6YDv5V0v6TzJPUFVoyI5/I6zwMr5ukBwJTC9lNzWmfpHeX/AEnjJY2fPn16yWybmVl7jXizYS9gKHBWRGwIvE6qunpPLnVEyf1/SEScGxHDImJYW1tbV+3WzKzlNeLNhlOBqRFxd56/ihRUXsjVVOTfL+bl04BBhe0H5rTO0s3MrE7q/mbDiHgemCJprZw0AphEeulVpYfVaODaPD0W2Cv30tqE1N34OVLJaKSkfrkn18icZmZmddKoNxt+B7gk98x6GtiHFNCuzG0tzwJfyeteD2wHTAbeyOsSETMkHQvcm9c7ptLIbmZm9THfICKpB7AE0GVvNoyIB/hgd+GKER2sG8DBneznAuCCsvkwM7OFM98gEhFzJf0qN4JPrEOezMysSVTbJnKzpC8XX0xlZmZWbRD5JvAH4G1JsyW9Kml2DfNlZmZNoNon1peudUbMzKz5VPs+kS06So+I27s2O2Zm1kyq7eL7w8L0EsDGpHeLlB2A0czMFgHVVmftWJyXNAg4rSY5MjOzplFtw3p7U4F1ujIjZmbWfKptEzmT9wdE7AFsANxXq0yZmVlzqLZNZHxheg5wWUTcUYP8mJlZE6k2iFwFvBUR7wJI6ilpyYh4o3ZZMzOz7q7qJ9aBPoX5PsD/dn12zMysmVQbRJaIiNcqM3l6ydpkyczMmsWCvE9kaGVG0kbAm7XJkpmZNYtq20S+B/xB0r9JQ8F/FNitZrkyM7OmUO3DhvdKWpv0PhFYyPeJmJnZoqGq6ixJBwN9I+KRiHgEWErSQbXNmpmZdXfVtonsHxGzKjMRMRPYvzZZMjOzZlFtEOlZfCGVpJ5A79pkyczMmkW1Des3AldIOifPHwj8tTZZMjOzZlFtEPkJqfqq0g5yI3B+TXJkZmZNY55BRFIv4ARgH2BKTl4FeJpUFfZuTXNnZmbd2vzaRH4JLA98LCKGRsRQYDVgWeDkWmfOzMy6t/kFkR1IPbNerSTk6W8B29UyY2Zm1v3NL4hEREQHie/y/vtFzMysRc0viEyStFf7REl7Ao/VJktmZtYs5tc762DgGkn7AhNy2jDSUPBfrGXGzMys+5tnEImIacBwSVsB6+bk6yPi5prnzMzMur2qnliPiFsi4sz80yUBJL8d8X5Jf87zq0m6W9JkSVdI6p3TF8/zk/PywYV9HJHTH5e0TVfky8zMqlftsCe1cAjwaGH+JODUiFgdmAnsl9P3A2bm9FPzekgaAuxOKiFtC/w6D8diZmZ10pAgImkgsD1wXp4XsBXpXe4AFwE75+lReZ68fERefxRweUS8HRH/BCYDG9fnDMzMDBpXEjkNOAyYm+dXAGZFxJw8PxUYkKcHkJ+Wz8tfyeu/l97BNmZmVgd1DyKSdgBejIgJ81256455gKTxksZPnz69Xoc1M1vkNaIksjmwk6RngMtJ1VinA8vlsboABgLT8vQ0YBC8N5bXssDLxfQOtvmAiDg3IoZFxLC2trauPRszsxZW9yASEUdExMCIGExqGL8lIr4G3ArsklcbDVybp8fmefLyW/JT9GOB3XPvrdWANYB76nQaZmZG9UPB18PhwOWSjgPu5/2h5s8HfidpMjCDFHiIiImSrgQmAXOAg/NwLGZmVicNDSIRcRtwW55+mg56V0XEW8CunWx/PHB87XJoZmbz0sjnRMzMrMk5iJiZWWkOImZmVpqDiJmZleYgYmZmpTmImJlZaQ4iZmZWmoOImZmV5iBiZmalOYiYmVlpDiJmZlaag4iZmZXmIGJmZqU5iJiZWWkOImZmVpqDiJmZleYgYmZmpTmImJlZaQ4iZmZWmoOImZmV5iBiZmalOYiYmVlpDiJmZlaag4iZmZXmIGJmZqU5iJiZWWkOImZmVpqDiJmZlVb3ICJpkKRbJU2SNFHSITl9eUnjJD2Zf/fL6ZJ0hqTJkh6SNLSwr9F5/Sclja73uZiZtbpGlETmAD+IiCHAJsDBkoYAY4CbI2IN4OY8D/AFYI38cwBwFqSgAxwNDAc2Bo6uBB4zM6uPugeRiHguIu7L068CjwIDgFHARXm1i4Cd8/Qo4OJI7gKWk7QSsA0wLiJmRMRMYBywbR1Pxcys5TW0TUTSYGBD4G5gxYh4Li96HlgxTw8AphQ2m5rTOks3M7M6aVgQkbQUcDXwvYiYXVwWEQFEFx7rAEnjJY2fPn16V+3WzKzlNSSISFqMFEAuiYhrcvILuZqK/PvFnD4NGFTYfGBO6yz9QyLi3IgYFhHD2trauu5EzMxaXCN6Zwk4H3g0Ik4pLBoLVHpYjQauLaTvlXtpbQK8kqu9bgRGSuqXG9RH5jQzM6uTXg045ubA14GHJT2Q044ETgSulLQf8CzwlbzsemA7YDLwBrAPQETMkHQscG9e75iImFGfUzAzM2hAEImIvwPqZPGIDtYP4OBO9nUBcEHX5c7MzBaEn1g3M7PSHETMzKw0BxEzMyvNQcTMzEpzEDEzs9IcRMzMrDQHETMzK81BxMzMSnMQMTOz0hxEzMysNAcRMzMrzUHEzMxKcxAxM7PSHETMzKw0BxEzMyvNQcTMzEpzEDEzs9IcRMzMrDQHETMzK81BxMzMSnMQMTOz0hxEzMysNAcRMzMrzUHEzMxKcxAxM7PSHETMzKw0BxEzMyvNQcTMzEpzEDEzs9KaPohI2lbS45ImSxrT6PyYmbWSpg4iknoCvwK+AAwB9pA0pLG5MjNrHU0dRICNgckR8XREvANcDoxqcJ7MzFpGr0ZnYCENAKYU5qcCw9uvJOkA4IA8+5qkx+uQt0boD7xUr4PppHodqWX4+jW3ul2/Bly7VTtb0OxBpCoRcS5wbqPzUWuSxkfEsEbnw8rx9WturXr9mr06axowqDA/MKeZmVkdNHsQuRdYQ9JqknoDuwNjG5wnM7OW0dTVWRExR9K3gRuBnsAFETGxwdlqpEW+ym4R5+vX3Fry+ikiGp0HMzNrUs1enWVmZg3kINLiJKnReTCz5uUg0sIkjQCWaHQ+rDzfBFijuU2khUm6DXgsIg6U1DMi3m10nqw6knpExNx2aQr/QzeVfBOg9teymTiItBBJPYAg/9FKWg+4JSI+0uCsWUmSPgWsFhFXNjovVp6kAcA6wAMR8VIz3RC4OquFRMTc/Ie5WC55PAI8IekIeG9AS2sCktokXQWcCGwu6VhJn2x0vmzBSOoj6UTgL8CWwB8lLdcsAQQcRBZpueRRnN9E0p+BM4C9cvLRwHcBXJ3V/XRwDTeTNIg0ltG9ETECeAX4ImnEBuuG2t+gSVpDUhuwLDAb2AT4A7ApsGMztXW5OmsR1L4oLGlVYDng18AxwLPABGCDiHhS0oPAmRFxXkd17VZfkjYDZueSYjF9FeAnwMVAP+B0YAbwIHBSRDwuqVdEzKl3nu3DJH0UGBERl3Sw7G/AScBc4BCgL/AacEZEXF/XjC4kl0QWEZKWyX+0VAKIpM9IGgv8HHgcGAmsAJwDzAR+mDf/KXBk3tYBpEEkVUaQWA14VckISd/M6c8BawIvkq7fQ8CREbFvDiAb5m2te1gSmAUgaUtJX5a0ZF52C7ABMB5YHTg6IraNiOslrS9p7bxdty+ROIgsAnJReX1gvzw/XNIXgV8Ad0XEVyPiLeCTwD7Al4FPAV+RtHpE/BHoI+lDw+hb7UlaTtLnKiWIfOe6ObAhaWii/SWNioj/ABOB7YAHgHuAIyRtLum3pBLKMg05CUNSL0nbSOqbk54BpkgaTaqy2pVU6gB4AZgVES8CtwNfkrRLfjvrpcAW8P4NYXfW1GNnWRIR70p6G/iGpG+Q2jxuIb1rZXkASYsBqwDP5d4fnyCNN/ZVUhXXmhHxakNOoAXlO8xtgXeA24CDcg+dLwCHAyOAT0bEGElLAV/NNwsPAXMi4lVJvyC9v2J/4DHgwIh4u/5nY/DeWH7rAztLmkDqCfkYqQpyHeBQUsP5fcDSwMp506NIN3VfJpVcdoqIp+qd/7IcRJpMsb0jN7ouR/oDfYvUwDoxIk7Ny8cCm0kaGBFTJc0gvZTrTtKd0XeAawAcQOorIkLSLODBfBPQhxT8z8nX6gzgFEnrRMTVkt4B9iCVTs7I+/gPqWrynMp+/bxP/VS6zLcrLfQHRpMe4v1hvmG7DzgkIk6W9HNSsHiLVHtAREwhlViuy9e0s313S67OajKFALJYbr9YA/hbREwg9dDpK2loXv1h4A3SHS/AzaT2j3NIdzsXRsTseua/lXXQhXoC8HVJ+wInk0oZvwaIiAdJ1SGjcmP5dcDvSW/zHNpuP0jqkW8wHEDqpNJlXtKQ/LwOpJuy60k3c5W3HJ5Hek0FEXEtcBHwedIN3dKF/f0nX8cehe743Z6DSDfXQdfAXpIOBXaVtHhE3A3cKenbEfFPUvF597z6w8BTwJa57/nciHgxBw9Xe9RZ5Qte0m6SvhkR75CqGFeJiNtI78c5UFK/vMmFwFakhldyr531I2L/DvbdNF86zap9I7ekNfOzOr8H9pR0aUTcBfwyz1cCxC3AC5IOBoiIO4DtImL79jUA+To2VecWB5FurvDFs7WkNXPja5DqUCvvPf4ZcESeHgtsLOlIUo+rp4CfR8Ss+ua8tVXuKCvT+fdQSXcDu5G6WQPcCrRJ2hT4DbAZsFQO+n8jdeF9r8dVvlH40PMjVjuVz7pQC7B2XtQGXBMRQ0m9rHaXtEW+sXuG1N4I6e2r5wPvvTp3UbqOfk6km2lfpy1pE9JTyW+SeuMsTQoYFwCXAzdGxDuSXgS+GxGX5y6hI4FT8l2PNYikJUjfP2/nO9GPR8T3K89z5A4PPwbejogTJP0MWI/UwL4TcHsusViDSRpFan88AVgi/9+tQ6qCfAIQsFJE7ChpK+BsUkP5tcAvF9Xr6CDSTbR/yC+XOp7I9eWPkOrLjyfd3awCfJ/U8+pM0jMDNwBtEbFO3TNvH2oIzVVSJ5C6VU8gdb99A7iDVGf+GilYHEa6Qfg2cGFE3C5pY+CpiHg576tpxlFqdrnKqke7G7kewJWk/7fzSf+HF0TEMZK+BmwZEfsrPQz6DLBFRPxdaZTsF6Lw0Gj7//NFgXtndRORBkTsA+xMujOdLek44Fukh8w+SvoC2ig3wF0BfAO4Clic9EDh3Q3JfAsrNILOzfNL53rur5CeRv4S6TodTgr8G5OqkV8n3RAcTHp25ylSkCEi7sn76hkR7zqA1E/+rN/N3aqXjojnSF1x3wK2zaWPKcDVpK7xfYBekj4NfJZ0w/CRvK+b4f22lEgWqQACLol0G5KWAW4iPVn+HVIwWYXU9vGxiPhEYd3tSKPvviVpB9IDhS91sFurE0lDgONITymfApwGHBYRf86lkgOBJSPiJ5J6kwbb+y5waURc2qBst7xiCTJ/2S9N6sE4klRifJ3UUP4EMBiYm2/4niS1Rf6J9ADh3sBvScOWvFbY/yJfimz6Rp1FRe5q+zCpp85s4EbS3c+TpIbXPSTtJOlPwL6kkgkR8WcHkMaRtJikM0nPblxCqrbalvS/tTlARMwE3gbeyT12TgN+BPyuGEDa9/6x2ik0lle66VZGzt2Y9JDuUOAu0oOCPUnjk/24UJJ4EPiviHgtIo4H1omIEyLitWJj+aIeQMDVWd3NycA1klaOiH9Leop0jS4mPVS4L3BFRJzXyEza+3LV4hukBvOrcyB4gtSj6ov5AbJ/kB4SvDnSk+ZHFQN/5W61Fb5wuotC9ePapDaOlSQdDnyTFESuIA0hs2e+Zj8GTpO0HGnMq5uA1SRtFhH/yJ0kepJLKo04p0ZxdVY3I+ks4NWIOHVtCcoAAAP+SURBVExpyO/vAJMi4sLG5sw6I2ll4O/AyIiYnHvs7EOqG1+MVCV5O/CTXMde2c5PlzdI7hV3Cqmb/Lmkdo9VSd1xtyeNvvtAXncI6cagH+lp8/HAdGAM6Zq2dE2ASyLdz3nAdZJOiIgpkn4ZEdMbnSnrXC41XgucCuxIqldfj/SF0wNYphg8Cts5gDRIoQS5Vm636kNqtxKpQ8RKkp4jDTWzDfDTiLhb0pXAQcAupB6RMxtzBt2Hg0g3ExETlN501jv3/HEAaQ7nAPdLupj0hPkfSe0gERGvF+vgG5hH+6DTScPKrB0Rj0l6HOhNeq5jOHAsMAkYE2kYGkgvkZoFfN7/m4mrs8y6iKSTgIER8bVG58Wqk69Zv4g4QFJ/4AekDi3HAMsXntVZ5J7v6CoOImZdRNJGwHXA6hHxhr94uj9Ja5K68o6IiGdzQ/uzEfFmXu4S5Hy4i69ZF4k0kvLPgaVzjyt/8XRzEfEEaeTdVfP8YxHxZuEBwZbrbbWgXBIxM7PSXBIxs5a3KIym2yguiZiZWWmOvmZmVpqDiJmZleYgYmZmpTmImFVJ0o8kTZT0kKQHJA0vsY8N8lD+lfmdJI3p2px+6JhbStqslsew1uVhT8yqoPQO9B2AoZFeddufNETGgtqA9K7t6wEiYiwwtssy2rEtSW9S/EeNj2MtyL2zzKog6UvAPhGxY7v0jUijwS4FvATsHRHPSbqN9KbJz5GG8d8vz08mvQ1vGunBxD7AsIj4tqQLSa/K3ZA0AvC+wF7ApsDdEbF3PuZI0guRFie9EXGf/B6LZ4CLSINALgbsShrC4y7gXdLIs9+JiL917adjrczVWWbVuQkYJOkJSb+W9Nk8nPiZwC4RsRFwAendFBW9ImJj4HvA0RHxDnAU6Z0wG0TEFR0cpx8paBxKKqGcCqwLfCJXhfUnvT5564gYShqW/PuF7V/K6WeRXpr0DHA2cGo+pgOIdSlXZ5lVId/pbwR8hlS6uIL0Otz1gHF5lIyeQHHI92vy7wmkV6tW47r8pr2HgRci4mEASRPzPgYCQ4A78jF7A3d2cswvVX+GZuU4iJhVKb//4zbgtvwlfzAwMSI27WSTt/Pvd6n+f62yzdzCdGW+V97XuIjYowuPaVaaq7PMqiBpLUlrFJI2AB4F2nKje+V96+vOZ1evkl5aVdZdwOaSVs/H7JtHoq3lMc065SBiVp2lgIskTZL0EKlK6SjSG+5OkvQg8AAwv660twJDchfh3RY0E/lFSHsDl+V83AmsPZ/NriO97/0BSZ9Z0GOazYt7Z5mZWWkuiZiZWWkOImZmVpqDiJmZleYgYmZmpTmImJlZaQ4iZmZWmoOImZmV5iBiZmal/T/ZxpRTyshbdwAAAABJRU5ErkJggg==\n"},"metadata":{"needs_background":"light"},"output_type":"display_data"}]},{"cell_type":"markdown","source":"Now lets take a look at some of the tweets to understand the length of the text samples.","metadata":{"tags":[],"cell_id":"5325ebaf-7c75-4658-9a38-1d1a8f32e25d"}},{"cell_type":"code","metadata":{"tags":[],"cell_id":"a59dfd25-9ec2-406b-800a-f9253c9bb27f"},"source":"df['text_len']  = df['text'].str.len()\r\ndf.text_len.value_counts()","outputs":[{"output_type":"execute_result","execution_count":10,"data":{"text/plain":"41.0     308\n46.0     301\n48.0     301\n42.0     298\n45.0     295\n        ... \n4.0        8\n3.0        5\n139.0      4\n141.0      2\n140.0      1\nName: text_len, Length: 139, dtype: int64"},"metadata":{}}]},{"cell_type":"markdown","source":"It makes sense that the tweets are capped at 140 characters as for Twitters limits but let's investigate what that occurance of 141 characters is","metadata":{"tags":[],"cell_id":"c363fc57-6dfd-4a27-9312-908b67ffaea7"}},{"cell_type":"code","metadata":{"tags":[],"cell_id":"580e5321-767d-4e80-8aaa-f71b99f9fd7b"},"source":"df[df[\"text\"].str.len()>140]","outputs":[{"output_type":"execute_result","execution_count":11,"data":{"application/vnd.deepnote.dataframe+json":{"variableDetails":{"dataframe":{"3138":{"textID":"0d64ba9efd","text":"Is so freaking bored on the bus. Hate being poor, ï¿½4.80 return for a 10 min train or ï¿½2 return for an HOUR long bus.ipod has no battery 2","selected_text":"bored","sentiment":"negative","text_len":141},"27318":{"textID":"d370238b6b","text":"just saw an advert for ATTICS TO EDEN on tv  out today and only ï¿½9.99 from HMV...so I`m not sure why I had to pay ï¿½13 at HMV...never mind","selected_text":"just saw an advert for ATTICS TO EDEN on tv  out today and only ï¿½9.99 from HMV...so I`m not sure why I had to pay ï¿½13 at HMV...never mind","sentiment":"neutral","text_len":141}},"columns":[{"name":"textID","stats":{"count":2,"unique":2,"top":"d370238b6b","freq":1,"nan_count":0}},{"name":"text","stats":{"count":2,"unique":2,"top":"just saw an advert for ATTICS TO EDEN on tv  out today and only ï¿½9.99 from HMV...so I`m not sure why I had to pay ï¿½13 at HMV...never mind","freq":1,"nan_count":0}},{"name":"selected_text","stats":{"count":2,"unique":2,"top":"just saw an advert for ATTICS TO EDEN on tv  out today and only ï¿½9.99 from HMV...so I`m not sure why I had to pay ï¿½13 at HMV...never mind","freq":1,"nan_count":0}},{"name":"sentiment","stats":{"count":2,"unique":2,"top":"neutral","freq":1,"nan_count":0}},{"name":"text_len","stats":{"count":2,"mean":141,"std":0,"min":141,"25%":141,"50%":141,"75%":141,"max":141,"nan_count":0}}],"frequencyInfo":[{"frequencyData":[{"name":"0d64ba9efd","frequency":0.5},{"name":"d370238b6b","frequency":0.5}],"type":"freq"},{"frequencyData":[{"name":"Is so freaking bored on the bus. Hate being poor, ï¿½4.80 return for a 10 min train or ï¿½2 return for an HOUR long bus.ipod has no battery 2","frequency":0.5},{"name":"just saw an advert for ATTICS TO EDEN on tv  out today and only ï¿½9.99 from HMV...so I`m not sure why I had to pay ï¿½13 at HMV...never mind","frequency":0.5}],"type":"freq"},{"frequencyData":[{"name":"bored","frequency":0.5},{"name":"just saw an advert for ATTICS TO EDEN on tv  out today and only ï¿½9.99 from HMV...so I`m not sure why I had to pay ï¿½13 at HMV...never mind","frequency":0.5}],"type":"freq"},{"frequencyData":[{"name":"negative","frequency":0.5},{"name":"neutral","frequency":0.5}],"type":"freq"},{"frequencyData":[{"x":140.5,"y":0},{"x":140.6,"y":0},{"x":140.7,"y":0},{"x":140.8,"y":0},{"x":140.9,"y":0},{"x":141,"y":2},{"x":141.1,"y":0},{"x":141.2,"y":0},{"x":141.3,"y":0},{"x":141.4,"y":0}],"type":"hist"}]},"numElements":2,"numColumns":5},"text/plain":"           textID                                               text  \\\n3138   0d64ba9efd  Is so freaking bored on the bus. Hate being po...   \n27318  d370238b6b  just saw an advert for ATTICS TO EDEN on tv  o...   \n\n                                           selected_text sentiment  text_len  \n3138                                               bored  negative     141.0  \n27318  just saw an advert for ATTICS TO EDEN on tv  o...   neutral     141.0  ","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>textID</th>\n      <th>text</th>\n      <th>selected_text</th>\n      <th>sentiment</th>\n      <th>text_len</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>3138</th>\n      <td>0d64ba9efd</td>\n      <td>Is so freaking bored on the bus. Hate being po...</td>\n      <td>bored</td>\n      <td>negative</td>\n      <td>141.0</td>\n    </tr>\n    <tr>\n      <th>27318</th>\n      <td>d370238b6b</td>\n      <td>just saw an advert for ATTICS TO EDEN on tv  o...</td>\n      <td>just saw an advert for ATTICS TO EDEN on tv  o...</td>\n      <td>neutral</td>\n      <td>141.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>"},"metadata":{}}]},{"cell_type":"markdown","source":"Lets look a little closer at these cases","metadata":{"tags":[],"cell_id":"12354517-f92c-45a6-9ecf-a1a20919733e"}},{"cell_type":"code","metadata":{"tags":[],"cell_id":"121d656c-5aff-4421-b283-146919ebac08"},"source":"df.iloc[3138].text","outputs":[{"output_type":"execute_result","execution_count":12,"data":{"text/plain":"'Is so freaking bored on the bus. Hate being poor, ï¿½4.80 return for a 10 min train or ï¿½2 return for an HOUR long bus.ipod has no battery 2'"},"metadata":{}}]},{"cell_type":"code","metadata":{"tags":[],"cell_id":"eb27811d-dc2e-4b76-8ce6-291ae01926b5"},"source":"df.iloc[27318].text","outputs":[{"output_type":"execute_result","execution_count":13,"data":{"text/plain":"'just saw an advert for ATTICS TO EDEN on tv  out today and only ï¿½9.99 from HMV...so I`m not sure why I had to pay ï¿½13 at HMV...never mind'"},"metadata":{}}]},{"cell_type":"markdown","source":"Interestingly enough both cases have ï¿ in the text body which is a called a Byte order mark. It shoudnt cause many problems but should be dealt with when cleaning the data. It is also worth investigating how many tweets have that symbol. More info can be found [here](https://stackoverflow.com/questions/18845976/whats-%C3%AF-sign-at-the-beginning-of-my-source-file)","metadata":{"tags":[],"cell_id":"bd147c5a-fe6b-4802-b927-357d4cbf51e1"}},{"cell_type":"code","metadata":{"tags":[],"cell_id":"71d69aff-ce4d-40a8-96a0-e23f3eca79f5"},"source":"df.text_len.plot(kind=\"hist\",bins = 15,color=\"r\")\r\nplt.title(\"Distribution of Text Lengths\")\r\nplt.xlabel(\"Text Length (Characters)\")\r\nplt.ylabel(\"Occurences\")\r\nplt.xticks(rotation=25)","outputs":[{"output_type":"execute_result","execution_count":14,"data":{"text/plain":"(array([-20.,   0.,  20.,  40.,  60.,  80., 100., 120., 140., 160.]),\n <a list of 10 Text xticklabel objects>)"},"metadata":{}},{"data":{"text/plain":"<Figure size 432x288 with 1 Axes>","image/png":"iVBORw0KGgoAAAANSUhEUgAAAZMAAAEkCAYAAADq09ysAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3de1zO9/8/8EdXRxLpgDKbQ9Mc1kRqhFAKi7YRzRjRHDaWs+YQM4zYATPiuy825jRfESZiPnNoo2Ex24xR6bwOdNBVXdfr94ef92ehuvS+Dl3zuN9ublzv1/v9ej+vt7oe1/v4MhFCCBAREcmgMHQBRERk/BgmREQkG8OEiIhkY5gQEZFsDBMiIpKNYUJERLIxTEhvIiMjsW7dOq30lZ6eDnd3d6hUKgDAqFGjsGfPHq30DQBhYWHYt2+f1vrT1KeffgovLy94e3vrfd3GYNiwYdi/f7+hy6DHMDN0AfTv0LdvX/z9998wNTWFqakpXFxcEBQUhOHDh0OhuP+dZfHixRr3tWTJEnTv3r3KeZydnXHx4kWt1L527VokJydj1apV0rT/+Z//0UrfTyI9PR2bN2/G999/D3t7+0ptBw4cwMKFCwEAKpUKZWVlqFevntQuZ1vs2LEDcXFx2LJlS5XzDBs2DG+++SaCgoJqvZ4ntWrVKuTn52Pp0qV6WyfVHsOEtGbDhg3o3r07CgsLce7cOSxduhRJSUn46KOPtLqeiooKmJn9+35009PTYWtr+0iQAMDgwYMxePBgAMBPP/2EWbNm4YcfftB3iURV4mEu0jobGxv4+vris88+w759+3Dt2jUAQEREBD799FMAQF5eHiZMmAAPDw94enpixIgRUKvVmDVrFtLT0zFx4kS4u7tj06ZNuH37NlxdXbFnzx707t0bo0ePlqZVVFRI601JScHQoUPRuXNnTJo0CQUFBQDuf/j26tWrUo19+/bF2bNn8cMPPyA6Ohrfffcd3N3dpQ/sfx42U6vV+OKLL9CnTx9069YNs2fPRmFhIQBIdezbtw+9e/eGl5cX1q9fX+W2KSwsxOzZs/Hyyy+jT58++OKLL6BWq3H27FmMHTsW2dnZcHd3R0RExBNv94yMDEyaNAleXl7w9fXFjh07AABCCIwePVra9gDwzjvvYNGiRbh69SqWLVuGc+fOwd3dvVaH1xITExEcHAwPDw+89tpr+Pnnn6W2YcOG4fPPP8ewYcPQuXNnvP3227hz547UvmfPHvj4+ODll1/Gpk2b4O3tjcTERMTHx2PLli2IiYmBu7s7hg4dKi2Tmpr62P5KSkowbdo0eHp6wsPDA8HBwZXWRTomiLSgT58+4syZM49M9/HxEdu3bxdCCDFnzhzxySefCCGEWLVqlViwYIEoKysTZWVl4vz580KtVj+2r9TUVNG2bVsxa9YsUVxcLO7duydNKy8vF0IIMXLkSNGjRw/xxx9/iOLiYjF58mQxY8YMIYQQP/74o+jZs2eV9a5Zs0aa94GRI0eK3bt3CyGE2LNnj/Dz8xMpKSmiqKhIvPvuu2LmzJmVaps3b564d++e+O2330SHDh3E9evXH7udZs2aJSZOnCgKCwtFamqq8Pf3l9bzuDof53HzVVRUiMDAQBEdHS2USqX466+/hI+Pj/jpp5+EEEKkp6cLT09PkZiYKHbv3i38/f1FSUmJEEKIb775RowePbradQYHB4uYmJhHpqempgpPT09x5swZoVKpxMmTJ4WXl5coKCiQlvP39xfJycmiuLhYDB8+XKxZs0YIIcSvv/4q3N3dxcWLF4VSqRQffvihaNeunTh//rwQQoiVK1eKuXPnPlJHVf1t2bJFTJkyRdy7d0+Ul5eLX375RRQXF9e4PUk7uGdCOtWkSZPHfjs0MzNDTk4O0tPTYW5uDg8PD5iYmFTb15QpU1C/fn1YWVk9tj0oKAht27ZF/fr1ER4ejiNHjkgn6OWIjY3FmDFj0KJFC1hbW2P69Ok4fPhwpb2iyZMnw8rKCi+88AJeeOEF/P7774/0o1KpcPjwYcyYMQMNGjTAM888g9DQUBw4cEB2jT///DOUSiXGjx8PCwsLtGrVCq+//joOHToEAHBycsK8efMwa9YsrFy5EitXrqx0zqW2YmJi0K9fP3Tv3h0KhQI+Pj5wcXHBmTNnpHmGDRuGZ599FvXr10dAQAB+++03AMCRI0cQEBCATp06wcLCAlOnToVara5xnVX1Z2Zmhry8PKSkpMDMzAxubm6oX7++7PdImvn3HXimOiUrKwuNGjV6ZPq4cePw+eefY+zYsQCA4cOHY/z48dX21axZs2rbnZycpH87OzujvLwc+fn5tai6suzsbDRv3lx63bx5c1RUVCA3N1ea5uDgIP27Xr16KCkpeaSf/Px8lJeXw9nZuVKdWVlZsmtMT09HWloaPDw8pGkqlarSRQz+/v5YtmwZ2rVrBzc3N9nrBIC0tDQcOnQIR44ckaZVVFQgOztbev3PbWNlZSVtm+zs7Er/Zw0aNICNjU2N66yqv+DgYPz999947733UFJSgldffRXh4eEwNTWt/RskjTFMSGeSkpKQlZWFLl26PNLWoEEDREREICIiAteuXcPo0aPx4osvolu3blX2V9OeS0ZGRqV/m5ubo3HjxqhXrx5KS0ulNpVKhby8PI37bdKkCdLS0qTX6enpMDMzg729PTIzM6td9p8aN24Mc3NzpKenw8XFRaqzadOmGvdRlWbNmqF169aIjY2tcp6VK1eiQ4cOuH79Oo4dO4Z+/foBqPn9V8fJyQnBwcFYsGDBEy/r6OhYafsVFRVJ56JqU5eFhQXCw8MRHh6O1NRUjBs3Di4uLtJ5MNItHuYirSsqKsL333+P6dOnY/DgwXB1dX1knu+//x7JyckQQsDGxgampqbSh4eDgwNSU1OfeL0HDhzA9evXce/ePaxevRoBAQEwNTVFq1atoFQqcfLkSZSXl2P9+vUoKyuTlrO3t0daWlqVh1gCAwOxdetWpKamori4GJ9++ikGDBjwxFeUmZqaon///vj0009RVFSEtLQ0bN68WSsfdg8Ce8uWLVAqlaioqMDvv/+OX3/9FQBw+vRpfPfdd1ixYgU++ugjLFy4EH///TeA+9s7IyMD5eXl1a6joqICSqVS+lNRUYHXXnsNR44cQUJCAlQqFUpLS5GQkICcnJwaax4wYACOHj2KpKQklJWVYfXq1dJl5MD9/5fbt29DaDhKxtmzZ3H9+nWo1WpYW1vD1NS0Un+kW9zSpDUPrsDy8fHBhg0bEBoaWuVlwcnJyQgNDYW7uzuGDx+ON954Ay+//DIAYPz48Vi/fj08PDzw5Zdfarz+oKAgREREwNvbG2VlZZg3bx6A+1eXLVy4EPPnz0evXr1Qr169SofM+vfvDwDw8vLCa6+99ki/Q4YMweDBgzFy5Ej4+vrCwsKiVt/EAWDBggWoV68e/Pz8MGLECAQGBmLIkCG16uufzM3NsXHjRly8eFG66mzRokUoKSnBnTt3MHfuXCxevBgODg7o3r07Bg4ciPnz5wMAevbsiWeeeQbdu3d/5Kq3f5o7dy7c3NykPwsXLkSLFi2wZs0arFmzRrpCbevWrRqd+2jfvj1mz56NyZMno1evXnB0dISNjQ0sLCwAAK+88gpKS0vh6emJkJCQGvvLysrCO++8g86dO2PQoEHw8fHBwIEDNdyCJJeJ0DT2iYh06M6dO/D09MSpU6fQpEkTQ5dDT4h7JkRkMMePH0dpaSmKi4uxfPlyvPTSSwwSI8UwISKDOXLkCLy9veHj44OsrKxKj7Qh48LDXEREJBv3TIiISDaGCRERycYwISIi2Z7qO+Dz84uhVj/+lJG9fQPk5hbpuaLaM7Z6AeOrmfXqFuvVLW3Uq1CYoHFj68e2PdVholaLKsPkQbsxMbZ6AeOrmfXqFuvVLV3Wy8NcREQkG8OEiIhkY5gQEZFsDBMiIpKNYUJERLIxTIiISDaGCRERyfZU32dChtPY5v4ASI6ONY/5rYmKUiXyC8tqnpGIdIJhQgZhZmWJM0HyRxh8oNueHVoLpgcYUESaY5jQv4LCwkKr4QQA3vv3AgwTIo3wnAkREcnGMCEiItkYJkREJBvDhIiIZGOYEBGRbHq5mis/Px+zZ89GSkoKLCws8Nxzz2Hx4sWws7ODq6sr2rZtC4Xifq5FRUXB1dUVAHDixAlERUVBpVKhQ4cO+Oijj1CvXr0a20j7GttYwMzK0tBlEFEdpZcwMTExQVhYGLy8vAAAK1aswKpVq7Bs2TIAwM6dO2FtXXn0ruLiYixYsADbt29Hy5YtMW/ePHz55ZeYPHlytW2kG9q+L8R7/16t9UVEhqeXw1y2trZSkABAp06dkJ6eXu0yP/zwAzp27IiWLVsCAEJCQvDdd9/V2EZERPqn95sW1Wo1duzYgb59+0rTRo0aBZVKhV69emHKlCmwsLBARkYGnJ2dpXmcnZ2RkZEBANW2ERGR/uk9TD788EPUr18fI0eOBACcPHkSTk5OKCoqwqxZs7Bu3TpMmzZNL7XY2zeotl3bj+fQNWOr1xg8vE2NbRuzXt1ivf+l1zBZsWIFkpOTsWHDBumEu5OTEwCgQYMGCA4OxubNm6XpP/30k7Rsenq6NG91bU8iN7cIarV4bJujow1ycgqfuE9D0XW9xvZLoy3/3Kb8mdAt1qtb2qhXoTCp8ku43i4N/uSTT3DlyhWsW7cOFhb3nxh7584dlJaWAgAqKioQFxeHdu3aAQB69uyJy5cv49atWwDun6QfMGBAjW1ERKR/etkz+fPPPxEdHY2WLVsiJCQEAPDMM88gLCwMkZGRMDExQUVFBdzd3REeHg7g/p7K4sWLMWHCBKjVarRr1w7z5s2rsY2IiPRPL2Hy/PPP448//nhsW2xsbJXL+fn5wc/P74nbiIhIv3gHPBERycYwISIi2RgmREQkG8OEiIhkY5gQEZFsDBMiIpKNYUJERLIxTIiISDaGCRERycYwISIi2RgmREQkG8OEiIhkY5gQEZFsDBMiIpJN78P2kn40trGAmZWlocswauqyMq0O21tRqkR+YZncsojqJIbJv5SZlSXOBA3RWn/e+/dqrS9jobCw0P42ZJjQvxQPcxERkWwMEyIiko1hQkREsjFMiIhINoYJERHJxjAhIiLZGCZERCQbw4SIiGRjmBARkWwMEyIiko1hQkREsjFMiIhINoYJERHJxjAhIiLZ9BIm+fn5ePvttxEQEIBBgwZh8uTJyMvLAwBcunQJgwcPRkBAAMaOHYvc3Fxpudq2ERGRfuklTExMTBAWFoa4uDjExsaiRYsWWLVqFdRqNWbNmoXIyEjExcXBw8MDq1atAoBatxERkf7pJUxsbW3h5eUlve7UqRPS09Nx5coVWFpawsPDAwAQEhKCI0eOAECt24iISP/0PtKiWq3Gjh070LdvX2RkZMDZ2Vlqs7Ozg1qtRkFBQa3bbG1t9fp+iDT1uGGA5eAwwFSX6D1MPvzwQ9SvXx8jR47EsWPH9L36SuztG1Tbrs1ffH0wtnqfNroYBtjRyrLaeYztZ4L16pYu69VrmKxYsQLJycnYsGEDFAoFnJyckJ6eLrXn5eVBoVDA1ta21m1PIje3CGq1eGybo6MNcnIKn/AdGs7D9RrbDznVTnU/o8b+M1zXPY31KhQmVX4J19ulwZ988gmuXLmCdevWwcLCAgDQsWNHlJaWIjExEQCwc+dO9O/fX1YbERHpn172TP78809ER0ejZcuWCAkJAQA888wzWLduHaKiorBw4UIolUo0b94cK1euBAAoFIpatRERkf7pJUyef/55/PHHH49t69y5M2JjY7XaRkRE+sU74ImISDaGCRERycYwISIi2RgmREQkG8OEiIhkY5gQEZFsen+cChFphybP+nrSJyHweV9UWwwTIiOl7Wd9Afef9wWGCdUCD3MREZFsDBMiIpJN48Nc169fh62tLRwcHFBcXIwvv/wSCoUC48aNQ7169XRZIxER1XEa75lMnz4dd+/eBXD/UfLnz5/HpUuXEBkZqbPiiIjIOGi8Z5KWlobWrVtDCIFjx47h0KFDsLKygq+vry7rIyIiI6BxmFhaWqKoqAg3btyAk5MT7OzsUFFRAaVSqcv6iIjICGgcJoGBgRg9ejSKi4sxcuRIAMDVq1fxzDPP6Ky4p0VjGwuY1TD8qiY4uiIRGYrGYTJ37lycPn0aZmZmePnllwEAJiYmeP/993VW3NPCzMpSN/cLEBHpyRPdtNijRw9kZGTg0qVL6NSpE1588UVd1UVEREZE46u50tPTERISggEDBiA0NBQAcOTIEcybN09nxRERkXHQOEwiIyPRu3dvXLhwAWZm93dovL29cfbsWZ0VR0RExkHjw1yXL1/Gxo0boVAoYGJiAgCwsbFBYWGhzoojIv3S5OGRT4IPjnx6aBwm9vb2SE5ORqtWraRp169fh5OTk04KIyL90/bDI/ngyKeHxoe5xo4di4kTJ2Lv3r2oqKjAwYMHMW3aNLz99tu6rI+IiIyAxnsmQ4cOha2tLXbt2gUnJyfExMQgPDwcfn5+uqyPiIzYkx42q2leHjaru57o0mA/Pz+GBxFpjIfNnh4aH+ZasmQJLly4UGnahQsXsHTpUq0XRURExkXjMDl48CA6duxYaVrHjh1x8OBBrRdFRETGRePDXCYmJhBCVJqmUqmgVqu1XhQRkT7IfS7e487xPK3ndTQOEw8PD3z22WeYNWsWFAoF1Go11q5dCw8PD13WR0SkMzp7Lh7DpGrz5s3DhAkT0KNHDzg7OyMjIwOOjo7YsGGDLusjIiIjoHGYNGvWDPv27cMvv/yCzMxMODk5wc3NDQoFh5EnInrgaX2KwBNdGqxQKODu7l7pPIlardYoUFasWIG4uDikpaUhNjYWbdu2BQD07dsXFhYWsLS8f9xy5syZ6NmzJwBIwwIrlUo0b94cK1euhL29fY1tRESG8rReDq3xbsWvv/6K4cOHo1OnTujQoQM6dOiA9u3bo0OHDhot7+vri+3bt6N58+aPtK1Zswb79+/H/v37pSBRq9WYNWsWIiMjERcXBw8PD6xatarGNiIi0j+N90wiIiLQp08fLFu2DFZWVk+8oic9UX/lyhVYWlpKy4WEhMDX1xcfffRRtW1ERKR/GodJWloapk2bJj0xWJtmzpwJIQS6dOmC6dOno2HDhsjIyICzs7M0j52dHdRqNQoKCqpts7W11Xp9RERUPY3DpF+/fjh9+rR0GEpbtm/fDicnJ5SVlWHp0qVYvHix3g5Z2ds3qLadY6oT1T1P4++ltt6zLredxmGiVCoxefJkdOnSBQ4ODpXaoqKial3Ag0fYW1hYYMSIEZg0aZI0PT09XZovLy8PCoUCtra21bY9idzcIqjV4rFtjo42yMnRz1gtT+MvB1FtafP30lh+97TxnrXxmaZQmFT5JVzjMHFxcYGLi4usQh5WUlIClUoFGxsbCCFw+PBhtGvXDsD9R7WUlpYiMTERHh4e2LlzJ/r3719jGxER6Z/GYTJ58mRZK1qyZAmOHj2Kv//+G6GhobC1tcWGDRswZcoU6bEsbdq0wcKFCwHcvww5KioKCxcurHT5b01tRESkf090n8mZM2dw6NAh5OXlYcOGDbh8+TKKiorQrVu3GpedP38+5s+f/8j0mJiYKpfp3LkzYmNjn7iNiIj0S+P7TL7++mssWrQILVu2xPnz5wEAVlZWWL16tc6KIyIi46BxmGzduhWbN2/G+PHjpTveW7dujZs3b+qsOCIiMg4ah0lxcbF05dWDe00qKipgbm6um8qIiMhoaHzOpGvXrti4caN06S4AfPXVV/Dy8tJJYURED9P2QxRJezQOk/nz52PixInYs2cPiouLERAQAGtra0RHR+uyPiIiiU4eokhaoXGYODg4YO/evbh8+TLS0tL4CHoiIpJoFCYqlQru7u5ITEyEm5sb3NzcdF0XEREZEY12K0xNTdGyZUvk5+fruh4iIjJCGh/mGjRoECZOnIi33noLzZo1q9SmyU2LRET076VxmOzYsQMAsHbt2krTTUxMcPz4ce1WRURERkXjMDlx4oQu6yAiIiPGS7GIiEg2jfdMfHx8qhxl8eTJk9qqh4iIjJDGYfLwI95zcnLw1VdfYeDAgVovioiIjIvGYeLp6fnYaWFhYRg9erRWiyIiIuMi65yJhYUFbt++ra1aiIjISGm8Z/LwuCWlpaX4z3/+g169emm9KCIiMi4ah0lmZmal1/Xq1UNoaCiCgoK0XhQRERkXjcPko48+0mUdRERkxDQ+Z7Jx40YkJSVVmpaUlIRNmzZpvSgiIjIuGofJV199BRcXl0rT2rRpg61bt2q9KCIiMi4ah0l5eTnMzCofFTM3N0dZWZnWiyIiIuOicZh06NAB33zzTaVpO3fuRPv27bVeFBERGReNT8C///77CA0NxYEDB9CiRQukpqYiJycHmzdv1mV9RERkBDQOk+effx5xcXE4efIkMjIy4O/vj969e8Pa2lqX9RERkRHQOEyysrJgZWWFV155RZp2584dZGVloWnTpjopjoiIjIPG50zeeeedR25czMzMxOTJk7VeFBERGReNw+TWrVtwdXWtNM3V1RV//fWX1osiIiLjonGY2NnZITk5udK05ORk2Nraar0oIiIyLhqfMxkyZAimTJmCqVOn4tlnn0VKSgpWr16N4OBgXdZHRPRUU5eVwdHRRit9OTraoKJUifxC7d8fqHGYjB8/Hubm5oiKikJWVhaaNWuGoUOHIjQ0tMZlV6xYgbi4OKSlpSE2NhZt27YFANy8eRMREREoKCiAra0tVqxYgZYtW8pqIyL6N1FYWOBM0BCt9ee9fy+ggzDR6DBXRUUFYmJicPXqVTg7O6Nv374YN24c3nrrLSgUNXfh6+uL7du3o3nz5pWmL1y4ECNGjEBcXBxGjBiByMhI2W1ERKR/NSZBYWEhQkJCsHLlSpibm6NDhw4wNzfHJ598gpCQEBQWFta4Eg8PDzg5OVWalpubi6tXryIwMBAAEBgYiKtXryIvL6/WbUREZBg1Hub6+OOPYWdnh6+++gr169eXphcXF2PatGn4+OOPsWjRoidecUZGBpo2bQpTU1MAgKmpKZo0aYKMjAwIIWrVZmdn98R1EBGRfDWGSXx8PHbv3l0pSADA2toakZGRCAkJqVWY1AX29g2qbdfWSS8iorpEF59tNYZJUVFRlXe4N2vWDEVFRbVasZOTE7KysqBSqWBqagqVSoXs7Gw4OTlBCFGrtieVm1sEtVo8ts3R0QY5OTUfwtMGhhYR6VNtP9sUCpMqv4TXeM6kRYsW+PHHHx/blpCQgBYtWtSqKHt7e7Rr1w4HDx4EABw8eBDt2rWDnZ1drduIiMgwagyT0NBQzJkzB3FxcVCr1QAAtVqNI0eO4P3338eYMWNqXMmSJUvQq1cvZGZmIjQ0VHq+16JFi7Bt2zYEBARg27Zt+OCDD6RlattGRET6V+Nhrtdffx0FBQWIiIjAjBkzYGtri4KCApibm+Pdd9/FkCE1X/88f/58zJ8//5Hpbdq0wZ49ex67TG3biIhI/zS6aXHs2LEYNmwYLl68iPz8fDRu3Bju7u5o0KD6E9hERPR00PgO+AYNGqBnz566rIWIiIyUxg96JCIiqgrDhIiIZGOYEBGRbAwTIiKSjWFCRESyMUyIiEg2hgkREcnGMCEiItkYJkREJBvDhIiIZGOYEBGRbAwTIiKSjWFCRESyMUyIiEg2hgkREcnGMCEiItkYJkREJBvDhIiIZGOYEBGRbAwTIiKSjWFCRESyMUyIiEg2hgkREcnGMCEiItkYJkREJBvDhIiIZGOYEBGRbGaGLsAYNbaxgJmVpaHLICKqM+pEmPTt2xcWFhawtLz/AT1z5kz07NkTly5dQmRkJJRKJZo3b46VK1fC3t4eAKpt0zUzK0ucCRqitf689+/VWl9ERIZQZw5zrVmzBvv378f+/fvRs2dPqNVqzJo1C5GRkYiLi4OHhwdWrVoFANW2ERGR/tWZMHnYlStXYGlpCQ8PDwBASEgIjhw5UmMbERHpX504zAXcP7QlhECXLl0wffp0ZGRkwNnZWWq3s7ODWq1GQUFBtW22traGKJ+I6KlWJ8Jk+/btcHJyQllZGZYuXYrFixejX79+Ol+vvX2DatsdHW10XgMRkb7p4rOtToSJk5MTAMDCwgIjRozApEmT8NZbbyE9PV2aJy8vDwqFAra2tnBycqqy7Unk5hZBrRaPbXN0tEFOTmGVbURExqqqz7aaKBQmVX4JN/g5k5KSEhQW3n9jQggcPnwY7dq1Q8eOHVFaWorExEQAwM6dO9G/f38AqLaNiIj0z+B7Jrm5uZgyZQpUKhXUajXatGmDhQsXQqFQICoqCgsXLqx0+S+AatuIiEj/DB4mLVq0QExMzGPbOnfujNjY2CduIyIi/TL4YS4iIjJ+DBMiIpKNYUJERLIxTIiISDaGCRERycYwISIi2RgmREQkG8OEiIhkY5gQEZFsDBMiIpKNYUJERLIxTIiISDaGCRERycYwISIi2RgmREQkG8OEiIhkY5gQEZFsDBMiIpKNYUJERLIxTIiISDaGCRERycYwISIi2RgmREQkG8OEiIhkY5gQEZFsDBMiIpKNYUJERLIxTIiISDaGCRERycYwISIi2RgmREQkm1GHyc2bNzF8+HAEBARg+PDhuHXrlqFLIiJ6Khl1mCxcuBAjRoxAXFwcRowYgcjISEOXRET0VDIzdAG1lZubi6tXr2Lz5s0AgMDAQHz44YfIy8uDnZ2dRn0oFCa1brds4qh5sRrQdn+66PNp608Xfdb1/nTR59PWny761HZ/NX321WY5EyGEqG1BhnTlyhXMmTMHhw4dkqYNHDgQK1euRIcOHQxYGRHR08eoD3MREVHdYLRh4uTkhKysLKhUKgCASqVCdnY2nJycDFwZEdHTx2jDxN7eHu3atcPBgwcBAAcPHkS7du00Pl9CRETaY7TnTADgxo0biIiIwN27d9GwYUOsWLECrVu3NnRZRERPHaMOEyIiqhuM9jAXERHVHQwTIiKSjWFCRESyMUyIiEg2hslj5OXlGboEIgP3FR8AABWTSURBVCKjYrpo0aJFhi6irkhKSsKiRYuwY8cOZGdnw9nZGY0aNTJ0WRopKirCjh07UFFRgUaNGsHCwgJCCJiY1O4ZPLpmjPVu3rwZd+/ehY2NDerXrw+1Ws16dSgtLQ0NGzYEgDr9s/FAdnY2rK2tDV2GxrRdLy8N/v+EEAgPD4ebmxsGDRqEdevWISUlBVu2bDF0aTX6/vvv8cEHH8DNzQ316tWDUqnEZ599ZuiyqnTixAksXrzYaOo9e/YsFi5cCDc3NzRq1AhXr17Fzp07DV1WlRISEhAZGWk09T7s0KFD2Lp1K8zMzPDSSy8hLCwM9vb2hi6rSjExMdi5cycsLS3h5+eHoKAgNGzYsM4GoM7qFSSEEOLXX38VI0eOrDStf//+4ujRowaqqHp3794VQghRUVEhPvjgA3HixAkhhBCFhYXC399ffP3110IIIdRqtcFq/Kfbt2+L8vJyoVKpjKLeGzduiOPHjwshhIiOjhbbtm2T2oYMGSLWr19fZ2oVQojMzExRWFgohDCOeqtSWFgowsLCRHx8vMjPzxdTp04V4eHhhi6rkn9ux4yMDBEaGirOnj0rLl++LN555x0REREhhBBCpVIZqsQq6bJenjP5/55//nlcv34dN27ckKYFBQXhwIEDBqzqUYmJiQgPD8e0adOwfft2CCGQkpKC9PR0AECDBg0wbdo0/O///i8A1IlvRrGxsfD19cXZs2ehUCjw+++/IysrC0Ddq/fmzZuYOHEiZsyYgdTUVADAhQsXUFxcLM0zdepUnDx5EmlpaYYqU3LhwgVMnToV48ePx9SpU5GamoqbN2/izp070jx1qd6HlZSUIDc3V3r9008/QaVSwdfXF7a2tli0aBF+//13JCQkGLDK/yotLUVJSYn0+tSpUxBCoFu3bujYsSMiIyNx+PBhpKSkQKEw/MdrcXExfv/9d+n16dOndVav4d9tHWFubo6AgABs375dmjZq1Cj89NNP0gefIanVaqxZswZLlixBv379MHr0aJw7dw5r1qzBK6+8gvPnz0vz9u/fHyqVCseOHQNw/xCeIWVnZ8POzg7x8fGoqKiAv78/zp07J7XXpXpjY2PRpk0b7Nu3D6NHjwYABAQE4OjRo9I8PXr0gEqlwo8//gjAcPVmZWVh3bp18PLywv79++Hg4IDPP/8c/fr1w/Hjx+tcvf9UVFSEJUuWICgoCMuWLcOnn34KAOjYsSOuXLmC/Px8AECjRo3Qq1cvxMbGAjDstl6yZAlGjBiBZcuWYdeuXVK9eXl5uHv3LgCgadOm8PHxwcaNGwFAehCtIeTl5eGNN95ARESE9OWiY8eOyM/P10m9DJN/CAwMxJkzZ5CTkwMAsLa2RqdOnfDnn38auDJAoVDA09MTH3/8MQIDA9GzZ0+4u7ujvLwcjo6OKCsrqxQor776Kk6fPg3AcN/2hRAoKytDXl4eoqOjcerUKVy5cgVubm5QKpWVAqUu1JudnY1r164hNDQUwP1vncnJyXBzc0PDhg0RHx8vzdu/f3/ptaHqvXLlCjIyMvDGG28AABwdHeHv7482bdrAxsamztX7TxcuXEB6ejp27tyJKVOmYP/+/YiNjUXTpk3Rs2dPfP3119K8o0aNwvHjx6FUKg1S+40bNzB58mTUr18fGzduRJcuXRATE4MbN26gadOmcHFxqTSu0jvvvIMTJ04AAExNTfVe7wMKhQKNGjXCvXv3cPHiRQD3jwS4uLjg8OHD0nzaqpdh8g8eHh7o1KkTIiMjkZKSgp07d6K8vBydO3c2dGkAgM6dO6NNmzYoKysDAFy/fh329vbw9vZG27Zt8eWXX0rzFhcXo1u3boYqFcD9Dy0LCwv8+uuvcHFxgY+PD+Lj42Fvbw8nJ6dKFzfUhXqbNGmCixcvIjExEbNnz0Z0dDRWrVqFzz//HH5+fli9erU0r0qlgre3twGrBXx9fVFUVIS5c+ciICAA3333HQ4dOoS1a9fC39+/TtV7+fJl/P3339Lro0ePonPnzrC3t0fLli3x7rvvYu/evcjKysLgwYMRGxsr/Zw7OzujY8eOuHXrlkHqdXJywvTp0zF9+nQ4ODiga9euaNy4MYqLi2FtbQ1PT0/Ex8ejvLwcAFC/fn14eHhIh571Xe8DFy5cQGBgIEJCQvDVV19J78XDwwPHjh3Ter0Mk4fMnz8fbm5umDNnDuLj4zFp0iTUr1/f0GUBACwsLKS/y8rKkJ6eDjc3N5iYmCAsLAy3b9/GBx98gLCwMFy4cAEtW7Y0bMG4/+3Z1dUV9erVQ9++fbF9+3a89dZb6NOnD1JSUupcvYMHD8batWvRu3dvbNu2DfPnz8e1a9fQtWtXNG/eHBEREQgNDUVMTAw6duxo6HJx4MABNG7cGBMnTkR8fDzef/99XL16FV26dKkT9d66dQsTJkxAcHAwtm3bBrVaDQBwcXGpdCguODgYd+/exS+//AIfHx8899xzWLBgAa5evYpVq1bB2toarq6ueq8XuP9h6+npKc1jaWmJtLQ0PPfcc7C0tETv3r1hamqKpUuXIi8vD5s3b0aDBg3g7OxskHofHAosKipCTk4ORo4cidu3byMxMRHJycnw9vaGubm59uuVdfr+XywvL8/QJVQrISGh0tVnJSUloqSkRBw5ckTs3LnTgJVV9scff4jBgweLd955R/j7+4tXXnlFLFiwQAghRE5Ojjh27FidqjcxMVG4urqKX375RZo2btw4sWfPHqFUKkVCQoL4v//7PwNWWFlFRYUYOnSouH37tjRt7NixYteuXaKsrMzg9aampordu3eLo0ePitdee01kZWUJIYRISUkRr732WqXtvGbNGjFu3DghxP2rjjZv3iyCg4PF/PnzRWpqqkHqzc7OltoeXMW1Z88eMX369ErLpaSkiGXLlonAwEAxd+5ckZ6ebvB6o6KiRFJSkhBCiOHDhwtXV1fx4Ycf6qxehomReXD5XnR0tFi1apU4ffq0eOONN8TixYvr5KWfN27cEJMnTxYbN26UXnt5eYm0tDQDV1a16dOni5kzZ4rs7GwRGxsrRo0aJZKTkw1d1mOpVCoRHh4uXd4ZExMjRowYIa5fv27gyv7rwSXLQ4cOFVu3bhVCCKFUKsXy5cvFpEmTpPlOnTolpk+fLpRKpTTt3r17+i1WVK73q6++eqR9xowZ4tChQ0IIIW7evCnNL8T9L3X6VlW9y5YtEwMGDBADBgwQEyZMED4+Po/8HGuzXt4Bb2RMTExQWlqKiIgIJCUlIT8/H2FhYXjzzTfrxInVh9na2mLgwIHo0qULAKBx48bo0aMHWrVqZeDKqubt7Y3s7Gxs3rwZN2/exLhx4+rEIa3HMTExQevWrREXF4etW7ciKysLb7/9Ntzc3AxdmuTB4VlLS0vs3r0bAQEBsLa2Rvv27bF69WpYWlri7t27WLduHXx9ffHSSy9Jy5qZmRm03l27dsHf3x8WFhZQKBTIzMzEgQMH0KFDB2zatAm7d++Gm5sbmjZtCuD+VaGGrrd///5QKBS4du0amjdvjtmzZ2PkyJG4cOECkpOT0a1bN+kyYG3WyzvgjVBFRQXWr18PX19ftG/f3tDlaESlUkGhUNTJwKtKQUEBbG1tDV2GRoqLi1FYWIhmzZoZupRqDRo0CGFhYQgKCgJw/yTxf/7zHyQkJOD1119HSEiIgSus7OF6Y2JiEBERgV69esHPzw/Dhg0zcIWVDRo0COPGjcOrr776SFt+fj4aN26ss3UzTIhI51QqFUxNTXHgwAF8++236N27N5KSkursY3QerrdPnz5ISkrC5MmT8ccff2DgwIGGLrGSx9V76dIlrF69WnpMitDx4130vw9JRE+dB/cvZGRk4Ny5czAzM5NuCtX1h1xtPFyvqakpxowZgzZt2qBNmzYGru5R1W3fB9tW19uYYUJEepGQkICbN29i79696NChgzS9rgXJA1XVW1cZul4e5iIivaiLeyDVYb1PhmFCRESy8Q54IiKSjWFCRESyMUyIiEg2hgkREcnGMCHSgx07dmDMmDFPtMyyZcuwY8cOjef39vZGYmLiE1ZmXJKSkjBq1ChDl0GPwTAhnXN3d5f+vPDCC3Bzc5NeyxkWWalUwtXVFZmZmVXOU5sPcblu3Lgh+zE3WVlZiIuLw5AhQ6Rpd+7cwYcffggfHx+4u7ujX79+WLFiBQoKCuSWrFXDhg3D/v37ddL3gyEXzpw5o5P+qfZ40yLp3INR3gCgb9++WLJkCbp3727Aiuq+b7/9Fn5+ftJD/EpLSzFq1Cg0bdoUmzdvRsuWLZGXl4ft27fj6tWrWt2eD8YcMdQY5g8eDVKVQYMGYdeuXQYfnIwq454JGZxKpZKeGOvl5YUZM2ZIY1Tv27cPAQEBKCkpAQAcO3YMvXr1wp07d/Dmm28CuD8krbu7e6VhajVRUFCA2bNnw9vbGz4+Pvj888+lD9IdO3Zg9OjRWLJkCTw8PODn54ezZ89Ky966dQshISFwd3fHuHHjEBkZiXnz5gEARo4cCZVKJe19Xb16FcD9m8qq6u9hp06dQteuXaXX3377LQoKCrB27Vq0bt0aCoUCDg4OCA8PrxQkV65cQWBgILp06YIZM2ZIoxXm5eUhLCwML7/8Mjw9PTFp0iRkZ2dLyw0bNgyrV69GcHAwXnrpJWRnZ2Pnzp3Stu3Xrx++/fbbSjV+9913GDRoENzd3eHv74+EhAQsX74cly9fxvz58+Hu7o7ly5cDAK5du4a33noLXbt2xYABAyr9X02bNg1LlizB2LFj0alTJ1y8eBHx8fHSun18fKSRAgHAy8sLp06dMuj46vQYWnuYPZEG+vTpI86cOVNpWnR0tHjjjTdEZmamKC0tFXPmzJHG5xBCiMmTJ4sFCxaInJwc0a1bN3H69GkhhBClpaWibdu2IiMjo8r1ffPNN2L06NGPbRs3bpxYvHixKCkpEVlZWSIoKEgaSOqbb74R7du3F/v27RMVFRVi8+bNok+fPkKI+4MkBQUFiU8++UQolUrx448/ipdeeknMnTtXCCHE9evXRbt27R6po6r+HqdTp07i999/l15PmjRJGlSsKt27dxfDhw8XOTk5Ijc3V/j5+Ym9e/cKIf47ENm9e/fE3bt3xcSJE8XUqVOlZYODg0Xfvn3FjRs3RFlZmSgvLxfx8fEiJSVFqNVqcebMGfHiiy+Ka9euCSGEOHfunPDw8BAJCQlCpVKJtLQ08ddff0l9xcTESH0XFhYKb29vsX//flFRUSF++eUX0bVrV3Hr1i0hhBBTp04VXbt2FZcuXRIqlUoolUrh6ekpDZyVl5cnfv3110rvtUOHDtL6qG7gngkZ3M6dOzFjxgw0bdoUlpaWePfdd3H48GFp+NHFixfjxIkTGDNmDAYOHKiVwxtpaWlITExEREQE6tWrhyZNmmDUqFE4dOiQNE+rVq3w6quvwtTUFK+++irS0tJw9+5d3Lp1C3/99RfeffddWFhYwMvLC7169apxnVX19zCVSoWSkhJYW1tL0woKCuDo6FjjOsaMGQMHBwfY2dnBx8cHv/32GwDAwcEBfn5+sLKygo2NDSZMmIDz589XWjY4OBitW7eGubk5zMzM4OvrixYtWsDExATdu3dH165d8fPPPwO4v6cUEhKCl19+GQqFAs7OzlWOUXPs2DE8//zzGDx4MExNTeHm5oY+ffogLi5OmicgIAAvvfQSFAqFNHbIn3/+iaKiIjRu3PiRc1DW1tYoLCyscXuQ/vCcCRmUEAKZmZkYP358pecKqdVq5Ofnw87ODo0bN0a/fv2wY8cOREdHa2W96enpUCqV6NatW6V1Pvfcc9JrBwcH6d/16tUDAJSUlCA7Oxt2dnbS+QwAaNasGYqLi6tdZ1X9NWzYsNJ8pqamsLa2rtSfra0tcnJyanxf/1yHlZUV8vPzAdwfD3zp0qVISEiQAkypVFZa9uGxUI4fP47169cjJSUFarUapaWl8PDwAHD/6bQP/l2TtLQ0nD9/vtL8KpUKQ4cOrXLdX3zxBTZs2IDly5ejXbt2mDlzZqUBv4qLi2FjY6PR+kk/GCZkUCYmJmjatCnWrl1b5WiGSUlJOHToEPr3748lS5Zg/fr10rK11axZM9SvXx/nz59/4n4cHR2Rl5eHsrIyKVAyMzOlDzdtPGzP1dUVt27dgqurKwCge/fu2LRpE5RKJSwtLZ+4v02bNiErKwvffvstHBwccOnSJYwYMaLSPP+su6SkBOHh4VizZg169eoFMzMzjBs3TtpbdHJyQkpKymPX9fD7d3JyQo8ePbBhw4Yq63t4GXd3d0RHR6OsrAxbtmzBzJkzcfToUQBASkoKzM3N8eyzz2q+AUjneJiLDC4kJAQff/wxMjIyAAC5ubk4ceIEAODevXuYNWsW5syZg+XLl+PmzZvSiWALCwvY2NggNTW12v7VajWUSqX0p6ysDC1atECnTp0QFRWFoqIiqNVq3Lp1S6P7NFq1aoVWrVrhiy++QHl5Oc6fP49Tp05J7fb29lCpVEhPT6/tJoGPj0+lw1BDhw5Fo0aNEB4ejps3b0IIgby8PHz++edISEiosb/i4mJYWVmhYcOGyMvLkwK5KqWlpaioqIC9vT0UCgWOHz+Oc+fOSe3BwcHYtWsXzp8/D7VajYyMDNy8eRPA/ff/z/8TPz8/XL16FYcPH0Z5eTnKyspw6dIlaf6HlZSU4NChQygqKoK5uTmsra0rhc25c+fg7e1d7RVfpH8MEzK4sLAwdOvWDaNHj4a7uztCQkKkK6CWL1+ONm3aYMiQIbCyskJUVBSioqJw+/ZtAMB7772H8PBweHh44Pjx44/t/6effoKbm5v0p1OnTgCAjz/+GIWFhRgwYAA8PT0xbdo05Obm1liviYkJPv30UyQkJMDT0xMbNmxA//79pb2URo0aISwsDK+//jo8PDyk8xZP4rXXXkN8fLx0NZaVlRW+/vprODs7Y/To0ejcuTOGDx+OkpISje5pGTt2LPLz8+Hl5YURI0bUeI7Hzs4Oc+bMwcSJE+Hl5YX4+Hj4+PhI7R4eHli0aBE++OADdOnSBWPGjJHu9xkzZgz279+Prl27IioqCo0aNcKXX36JvXv3okePHujZsyc+++wzVFRUVLn+vXv3ok+fPujSpQtiYmIQFRUltcXGxta54X2Jj6An0opJkyahU6dOmDBhgtb6/Oijj9CyZUu88cYbWuvT2CUlJSEqKgrbtm0zdCn0EIYJUS388ssvsLe3h7OzM06ePIn33nsPMTExcHFxMXRpRAbBE/BEtZCZmYkpU6bg7t27aNasGZYtW8Ygoaca90yIiEg2noAnIiLZGCZERCQbw4SIiGRjmBARkWwMEyIiko1hQkREsv0/CC7NzgSj7BoAAAAASUVORK5CYII=\n"},"metadata":{},"output_type":"display_data"}]},{"cell_type":"markdown","source":"The text lengths are relatively evenly distributed. There are more shorter texts in the dataset but not by a large degree. The color of the graph has been changed to red because I like red better.","metadata":{"tags":[],"cell_id":"0e8c10bb-d7b8-409b-9bfd-b7843f8a7d6f"}},{"cell_type":"markdown","source":"There was some valuable insight there on the character level. Now lets try and analyze the text length on a word level","metadata":{"tags":[],"cell_id":"a7173779-a7ff-4fc6-a65b-8fa449505446"}},{"cell_type":"code","metadata":{"tags":[],"cell_id":"5f81ad5a-e523-4432-872c-179996582498"},"source":"df['num_words'] = df.text.apply(lambda x: len(str(x).split()))\r\ndf[\"num_words\"]","outputs":[{"output_type":"execute_result","execution_count":15,"data":{"text/plain":"0         7\n1        10\n2         5\n3         5\n4        14\n         ..\n27476    16\n27477    23\n27478    22\n27479     6\n27480    11\nName: num_words, Length: 27481, dtype: int64"},"metadata":{}}]},{"cell_type":"code","metadata":{"tags":[],"cell_id":"e64a3118-c3bb-4120-9ab6-48848abb6678"},"source":"df.num_words.plot(kind=\"hist\",bins = 16,color=\"r\")\r\nplt.title(\"Number of words in tweet\")\r\nplt.xlabel(\"Text Length (Words)\")\r\nplt.ylabel(\"Occurences\")\r\nplt.xticks(rotation=25)\r\nsns.set(style=\"darkgrid\")","outputs":[{"data":{"text/plain":"<Figure size 432x288 with 1 Axes>","image/png":"iVBORw0KGgoAAAANSUhEUgAAAZMAAAEhCAYAAAC6Hk0fAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3de1iUdf7/8SeDDiiiCKKAulqaiLKsKKLmMQ8ruqwdzCKzTNPKPLSWlW6GpqmBh9zMY1nfSpNNS8FDouba0UzynKXZqiWgKGgChujM/P7w572RBwYGZhh9Pa7L6/L+fO7De8ZxXnPfn/vgYbPZbIiIiDjA5OoCRETE/SlMRETEYQoTERFxmMJEREQcpjARERGHKUxERMRhChO5oYwdO5ZXX33VJdu22WyMGzeO1q1bc++997qkhstCQ0M5evRoiZZJSUlh8ODB5VSR3OgUJlKuunbtSrt27Th37pzRtnz5ch566CEXVlU+vv32W7788ks+/fRTVqxY4epySqxPnz689dZbpVp2zpw5jBkzpowrur5jx44RGhrKxYsXnbpduTqFiZQ7q9XKu+++6+oySsxisZRo/vT0dOrWrUvVqlXLqaIr6YtUKgqFiZS7Rx99lLfeeouzZ89e0Xe1X5cPPfQQy5cvB+Cjjz4iLi6OqVOnEhUVRbdu3dixYwcfffQRnTt3pl27dqxcubLIOk+fPs2gQYOIjIxkwIABpKenG30//fQTgwYNIjo6mp49e7Ju3Tqjb+zYsUyYMIGhQ4fSokULtm3bdkW9J06c4IknniA6OpoePXrwwQcfAJf2tsaPH8+uXbuIjIzktddeu2LZO+64g3379gGXDimFhoby448/Gss/+eSTABQWFjJlyhQ6dOhAhw4dmDJlCoWFhQBs27aNTp06sWjRItq3b8+4ceMAePPNN435/7hX9Omnn9K7d28iIyPp2LEjixcvvuq/00cffcQDDzxgTIeGhrJs2TL++te/EhUVxUsvvcTVbpjx2WefsXDhQj7++GMiIyPp06cPX3/9NX//+9+NeQYNGkTfvn2N6f79+7Np0ybjPR05ciRt27ala9euRX54WK1WFi1aRPfu3WnTpg1PPfUUZ86cAWDAgAEAtG7dmsjISHbu3HnV1yXOoTCRchceHk50dPQ1v8SKs2fPHkJDQ9m2bRuxsbE8/fTT7N27l40bNzJ9+nQmTZpEfn6+Mf/q1at58skn2bZtG02bNjUOv5w7d47BgwcTGxvLV199xauvvspLL73EoUOHjGXXrFnDE088wY4dO2jVqtUVtTz99NMEBQXx+eef89prrzFr1iy2bt1Kv379eOmll2jRogU7d+5k1KhRVyzbunVrvvnmGwC2b99O/fr12b59uzEdHR0NwPz589m9ezfJycmkpKSwd+9e5s2bZ6zn1KlT/Prrr/znP/9h8uTJfPbZZ7z11lu89dZbbNiwga1btxbZ7gsvvMCkSZPYuXMna9asoW3btna/91u2bGHFihWkpKTw8ccf8/nnn18xT6dOnXj88cfp1asXO3fuJCUlhRYtWnDkyBFycnK4cOECBw4cICsri7y8PAoKCti3bx+tWrXCarUybNgwQkND+eyzz3jnnXd45513jO289957bNq0iSVLlvD5559To0YNJk2aBMCSJUuM927nzp1ERkba/bqk7ClMxClGjRrFkiVLyMnJKfGy9erVo2/fvnh6etK7d28yMzMZPnw4ZrOZDh06YDab+fnnn435u3TpQuvWrTGbzYwePZpdu3aRmZnJli1bqFu3Ln379qVSpUo0a9aMnj17sn79emPZbt260apVK0wmE15eXkXqyMzMZMeOHYwZMwYvLy/CwsLo168fycnJdr2O34dJWloajz/+eJEwad26NXApDIcPH05AQAD+/v4MHz6clJQUYz0mk4lRo0ZhNpvx9vbm448/5p577qFJkyZUrVqVESNGFNlupUqVOHToEHl5edSoUYPmzZvb/d4PHTqU6tWrExISQps2bfjhhx/sWs7b25s///nPpKWl8d1339G0aVNatmzJjh072LVrFw0aNKBmzZrs3buXnJwcRowYgdlspn79+tx3333GHmNSUhKjR48mKCgIs9nMiBEjSE1N1eG9CqiSqwuQm0OTJk3o0qULixYtolGjRiVaNiAgwPi7t7c3ALVq1TLavLy8iuyZBAUFGX/38fGhRo0aZGVlkZ6ezp49e4iKijL6LRYLffr0MaaDg4OvWUdWVhY1atSgWrVqRltISIhx6Ko40dHRJCYmkpWVhdVqpVevXrz++uscO3aM3NxcwsLCjO2EhIQU2UZWVpYxXbNmzSJBl5WVRXh4uDFdt27dItt97bXXmD9/PjNnziQ0NJRnnnnG7l/xgYGBxt+rVKlS5H0uzuXwrFOnDq1bt6Z69eps374ds9ls7IWlp6eTlZV1xb/J5emMjAyGDx+OyfS/370mk4ns7Gy76xDnUJiI04waNYq77767yOmnlwerCwoKjC/pkydPOrSd48ePG3/Pz8/n119/pXbt2gQHB9O6dWvefvvtUq23du3a/Prrr+Tl5Rm1ZmZmUqdOHbuWb9CgAd7e3ixZsoSoqCiqVatGrVq1+OCDD4y9ocvbycjI4LbbbjO2Ubt2bWM9Hh4eV9SVmZlpTGdkZBTpj4iIYP78+Vy4cIGlS5fyj3/8g08//bTkb8B1/LEmuBSer7zyCiEhIQwdOpQaNWrw4osvUrlyZR588EHgUnjXq1ePDRs2XHW9QUFBTJ069aqHHH8/Fiaup8Nc4jQNGjSgd+/evPfee0abv78/derUITk5GYvFwooVK/jll18c2s6nn35KWloahYWF/Otf/+Ivf/kLwcHBdOnShSNHjrBq1SouXLjAhQsX2LNnDz/99JNd6w0ODiYyMpJZs2Zx/vx5fvjhB1asWFFkz6Y40dHRLFmyxDik9cdpgL/97W/Mnz+fnJwccnJymDt3bpHB7D+KiYlh5cqVHDp0iN9++43XX3/d6CssLCQlJYXc3FwqV66Mj49PkV/5ZSUgIID09HSsVqvRFhkZyeHDh9mzZw8RERHcdtttxt7h5dcbERGBj48PixYtoqCgAIvFwsGDB9mzZw8ADzzwALNnzzaCIycnxxi49/f3x2QyOfx5kbKhMBGnGj58eJFrTgAmT57M4sWLadOmDYcOHXJ4IDU2Npa5c+fSpk0bvvvuO6ZPnw5AtWrVWLx4MevWraNjx4506NCBGTNmGGdK2WPWrFmkp6fTsWNHRowYwciRI7n99tvtXr5169bk5+cXCZPfTwM8+eSThIeH06dPH/r06UPz5s2NM72upnPnzgwcOJCBAwfSo0ePKwbYk5OT6dq1Ky1btiQpKcl4P8pSTEwMAG3atOHuu+8GLu11Nm/enMaNG2M2m4FLARMSEmIcuvT09GTBggX88MMPdOvWjbZt2zJ+/Hjy8vIAePjhh+natSuDBw8mMjKS++67zwiaKlWq8MQTT/DAAw8QFRXFrl27yvx1if089HAsERFxlPZMRETEYQoTERFxmMJEREQcpjARERGHKUxERMRhChMREXHYTX0F/OnT+Vit/zszOiCgGtnZeS6syDGq33XcuXZw7/rduXZwr/pNJg9q1vS5at9NHSZWq61ImFxuc2eq33XcuXZw7/rduXZw//pBh7lERKQMKExERMRhChMREXGYwkRERBymMBEREYcpTERExGEKExERcdhNfZ2JXFtNXzOVvL2Kn9EOFwvOczrX/gdQiYj7UZjIVVXy9uLLO/uWybraJ38IChORG5oOc4mIiMOctmfy5JNPcuzYMUwmE1WrVuXFF18kLCyMw4cPM3bsWM6cOYOfnx8JCQk0bNgQoNR9IiLiXE7bM0lISCAlJYVVq1YxePBg/vnPfwIwYcIE+vfvT2pqKv379yc+Pt5YprR9IiLiXE4LE19fX+PveXl5eHh4kJ2dzf79+4mNjQUgNjaW/fv3k5OTU+o+ERFxPqcOwL/wwgt8+eWX2Gw23nzzTTIzM6lTpw6enp4AeHp6Urt2bTIzM7HZbKXq8/f3d+ZLEhERnBwmU6ZMAWDVqlUkJiby1FNPOXPzVwgIqHZFW2Cg71XmdB8VtX5766qo9dvDnWsH967fnWsH968fXHRq8F133UV8fDxBQUGcOHECi8WCp6cnFouFrKwsgoODsdlspeoriezsvCLPEQgM9OXkydyyfrlOU5b1l/WH25663Pn9d+fawb3rd+fawb3qN5k8rvojHJw0ZpKfn09mZqYxvXnzZmrUqEFAQABhYWGsWbMGgDVr1hAWFoa/v3+p+0RExPmcsmfy22+/8dRTT/Hbb79hMpmoUaMGCxYswMPDg4kTJzJ27FjmzZtH9erVSUhIMJYrbZ+IiDiXU8KkVq1afPDBB1fta9SoEcuXLy/TPhERcS7dTkXKnbWwsMwG4HWfL5GKSWEi5c5kNus+XyI3ON2bS0REHKYwERERhylMRETEYRozuYGUZKBbRKQsKUxuIGU+0C0iYicd5hIREYcpTERExGEKExERcZjCREREHKYwERERhylMRETEYQoTERFxmMJEREQcpjARERGHKUxERMRhChMREXGYwkRERBymMBEREYfprsHiVsryNvt6nrxI2VGYiFvR8+RFKiYd5hIREYcpTERExGEKExERcZhTxkxOnz7Nc889x88//4zZbKZBgwZMmjQJf39/QkNDadKkCSbTpVxLTEwkNDQUgM2bN5OYmIjFYqF58+ZMmzaNKlWqFNsnIiLO5ZQ9Ew8PD4YMGUJqaiqrV6+mfv36zJgxw+hPSkoiOTmZ5ORkI0jy8/N58cUXWbBgARs3bsTHx4fFixcX2yciIs7nlDDx8/OjTZs2xnSLFi3IyMi47jKfffYZ4eHhNGzYEIC4uDg+/vjjYvtERMT5nH5qsNVqZdmyZXTt2tVoe+ihh7BYLHTq1ImRI0diNpvJzMwkJCTEmCckJITMzEyA6/aVREBAtSvayuoaBnEPZfnv7e6fHXeu351rB/evH1wQJpMnT6Zq1aoMGDAAgC1bthAcHExeXh7PPvssc+fOZfTo0U6pJTs7D6vVZkwHBvpy8mSuU7ZdHm6ED6SzldW/943w2XHX+t25dnCv+k0mj6v+CAcnn82VkJDA0aNHmT17tjHgHhwcDEC1atXo168fO3bsMNp/fygsIyPDmPd6fSIi4nxOC5NZs2axb98+5s6di9lsBuDXX3+loKAAgIsXL5KamkpYWBgAHTt2ZO/evRw5cgS4NEjfq1evYvtERMT5nHKY68cff2ThwoU0bNiQuLg4AOrVq8eQIUOIj4/Hw8ODixcvEhkZyVNPPQVc2lOZNGkSjz/+OFarlbCwMF544YVi+0RExPmcEia33XYbBw4cuGrf6tWrr7lc9+7d6d69e4n7RETEuXQFvIiIOExhIiIiDlOYiIiIwxQmIiLiMD0cS25aZfnURmuhHrIlNzeFidy0yvypjZwvk3WJuCMd5hIREYcpTERExGEKExERcZjCREREHKYwERERhylMRETEYTo12MVq+pqp5O3l6jJERByiMHGxSt5eZXytg4iI8+kwl4iIOExhIiIiDlOYiIiIwxQmIiLiMA3Ai5SBsrwD8cWC85zO1V2Ixb0oTETKQJnfgVhhIm7G7jA5dOgQfn5+1KpVi/z8fBYvXozJZOLRRx+lSpUq5VmjiIhUcHaPmTz99NOcPXsWgISEBLZv386uXbuIj48vt+JERMQ92L1nkp6ezq233orNZmPjxo2sXbsWb29vunXrVp71iYiIG7A7TLy8vMjLy+Onn34iODgYf39/Ll68yPnzerqciMjNzu4wiY2NZeDAgeTn5zNgwAAA9u/fT7169Ypd9vTp0zz33HP8/PPPmM1mGjRowKRJk/D39zcOlZ0/f566desyffp0AgICAErdJyIizmX3mMk///lPRo8ezcSJE40w8fDwYNy4ccUu6+HhwZAhQ0hNTWX16tXUr1+fGTNmYLVaefbZZ4mPjyc1NZWoqChmzJgBUOo+ERFxvhJdtNihQwcaNGjArl27APjzn/9Mu3btil3Oz8+PNm3aGNMtWrQgIyODffv24eXlRVRUFABxcXGsX78eoNR9IiLifHaHSUZGBnFxcfTq1YtBgwYBsH79el544YUSbdBqtbJs2TK6du1KZmYmISEhRp+/vz9Wq5UzZ86Uuk9ERJzP7jGT+Ph4unTpwvvvv2/sZbRv356EhIQSbXDy5MlUrVqVAQMGsHHjxpJVW8YCAqpd0VZWVzGLOMIVn0N3/uy7c+3g/vVDCcJk7969LFq0CJPJhIeHBwC+vr7k5ubavbGEhASOHj3KggULMJlMBAcHk5GRYfTn5ORgMpnw8/MrdV9JZGfnYbXajOnAQF9OnrT/9ZSFG+FDJGXPFZ9DZ2+zrLhz7eBe9ZtMHlf9EQ4lOMwVEBDA0aNHi7QdOnSI4OBgu5afNWsW+/btY+7cuZjNZgDCw8MpKCggLS0NgKSkJGJiYhzqExER57N7z2Tw4ME88cQTPPbYY1y8eJE1a9awcOFChg4dWuyyP/74IwsXLqRhw4bExcUBUK9ePebOnUtiYiITJkwocoovgMlkKlWfiIg4n91hcu+99+Ln58e///1vgoODWbVqFU899RTdu3cvdtnbbruNAwcOXLWvZcuWrF69ukz7RETEuUp01+Du3bvbFR4iInJzsXvM5OWXX2bHjh1F2nbs2MGUKVPKvCgREXEvdu+ZrFmzhueee65IW3h4OMOHDy/xtSYicm160Ja4I7vDxMPDA5vNVqTNYrFgtVrLvCiRm5ketCXuyO7DXFFRUcyePdsID6vVypw5c4xbmoiIyM3L7j2TF154gccff5wOHToQEhJCZmYmgYGBLFiwoDzrExERN2B3mAQFBbFy5Up2797N8ePHCQ4OJiIiApOpRPeKFBGRG1CJTg02mUxERkYWGSexWq0KFBGRm5zdYfLdd98xadIkDhw4YDxd0Waz4eHhwffff19uBYqISMVnd5iMHTuWO+64g6lTp+Lt7V2eNYlIGSnJacbFzafTjOV67A6T9PR0Ro8ebdwxWEQqPp1mLM5i92BHjx49+OKLL8qzFhERcVN275mcP3+eESNG0KpVK2rVqlWkLzExscwLExER92F3mDRu3JjGjRuXZy0iIuKm7A6TESNGlGcdIiLixkp0ncmXX37J2rVrycnJYcGCBezdu5e8vDzatWtXXvWJiIgbsHsA/r333mPixIk0bNiQ7du3A+Dt7c2//vWvcitORETcg91h8s477/D222/z2GOPGVe833rrrRw+fLjcihMREfdg92Gu/Px8goODAYxrTS5evEjlypXLpzIRqVD0nBW5HrvDpHXr1ixatIhhw4YZbe+++y5t2rQpl8JEpGLRBZByPXaHyfjx43niiSdYvnw5+fn59OzZEx8fHxYuXFie9YmIiBuwO0xq1arFhx9+yN69e0lPT9ct6EVExGBXmFgsFiIjI0lLSyMiIoKIiIjyrktERNyIXbsVnp6eNGzYkNOnT5d3PSIi4obsPsz197//nSeeeIKHH36YoKCgIn32XLSYkJBAamoq6enprF69miZNmgDQtWtXzGYzXl5eAIwZM4aOHTsCsGvXLuLj4zl//jx169Zl+vTpBAQEFNsnIiLOZXeYLFu2DIA5c+YUaffw8OCTTz4pdvlu3brx8MMP8+CDD17R99prrxnhcpnVauXZZ59l2rRpREVFMW/ePGbMmMG0adOu2yciIs5nd5hs3rzZoQ1FRUWVaP59+/bh5eVlLBcXF0e3bt2YNm3adftERMT5SnRvrvIyZswYbDYbrVq14umnn6Z69epkZmYSEhJizOPv74/VauXMmTPX7fPz83PFSxARuanZHSadO3e+5lMWt2zZUuoCli5dSnBwMIWFhUyZMoVJkyYxY8aMUq+vJAICql3RVlZX+IrI9ZXl/zV3/3/r7vVDCcJk+vTpRaZPnjzJu+++S+/evR0q4PItWsxmM/379zeusA8ODiYjI8OYLycnB5PJhJ+f33X7SiI7Ow+r1WZMBwb6cvJkriMvp8RuhA+RSGmU1f81V/y/LUvuVL/J5HHVH+FQgjCJjo6+atuQIUMYOHBgqQo7d+4cFosFX19fbDYb69atIywsDIDw8HAKCgpIS0sjKiqKpKQkYmJiiu0TERHnc2jMxGw2c+zYMbvmffnll9mwYQOnTp1i0KBB+Pn5sWDBAkaOHInFYsFqtdKoUSMmTJgAgMlkIjExkQkTJhQ5/be4PhERcT67w+SPzy0pKCjg008/pVOnTnYtP378eMaPH39F+6pVq665TMuWLVm9enWJ+0SkYivLOxBbC3XDyIrA7jA5fvx4kekqVaowaNAg7rzzzjIvSkRubGV+B2LOl8m6pPTsDhNdwyEiItdi9y1/Fy1axJ49e4q07dmzhzfeeKPMixIREfdid5i8++67NG7cuEhbo0aNeOedd8q8KBERcS92h8mFCxeoVKnoUbHKlStTqMEvEZGbnt1h0rx5c95///0ibUlJSTRr1qzMixIREfdi9wD8uHHjGDRoECkpKdSvX59ffvmFkydP8vbbb5dnfSIi4gbsDpPbbruN1NRUtmzZQmZmJn/961/p0qULPj4+5VmfiIi4AbvD5MSJE3h7e/O3v/3NaPv11185ceIEderUKZfiRETEPdg9ZvLkk09eceHi8ePHGTFiRJkXJSIi7sXuMDly5AihoaFF2kJDQ/nvf/9b5kWJiIh7sTtM/P39OXr0aJG2o0eP6mFUIiJif5j07duXkSNHsnnzZg4dOsTmzZsZNWoU/fr1K8/6RETEDdg9AP/YY49RuXJlEhMTOXHiBEFBQdx7770MGjSoPOsTERE3YFeYXLx4kZSUFPbv309ISAjNmzenXbt23HnnnZhMdu/ciIjIDarYJMjNzSUuLo7p06dTuXJlmjdvTuXKlZk1axZxcXHk5rrH4yZFRKT8FLtnMnPmTPz9/Xn33XepWrWq0Z6fn8/o0aOZOXMmEydOLM8aRUSkgit2z2TTpk1MnDixSJAA+Pj4EB8fz6ZNm8qtOBERcQ/FhkleXt41r3APCgoiLy+vzIsSERH3UmyY1K9fn6+//vqqfVu3bqV+/fplXpSIiLiXYsNk0KBBPP/886SmpmK1WgGwWq2sX7+ecePG8cgjj5R3jSIiUsEVOwB/zz33cObMGcaOHcszzzyDn58fZ86coXLlygwfPpy+ffs6o04REanA7LrOZPDgwdx3333s3LmT06dPU7NmTSIjI6lWrVp51yciIm7A7ivgq1WrRseOHcuzFhERcVO6fF1ERBzmlDBJSEiga9euhIaGcvDgQaP98OHD3H///fTs2ZP777+fI0eOONwnIiLO55Qw6datG0uXLqVu3bpF2idMmED//v1JTU2lf//+xMfHO9wnIiLO55QwiYqKIjg4uEhbdnY2+/fvJzY2FoDY2Fj2799PTk5OqftERMQ17B6AL2uZmZnUqVMHT09PADw9PalduzaZmZnYbLZS9fn7+7vq5YiIi1gLCwkM9C2TdV0sOM/p3MIyWdfNxmVhUhEEBFx5anNZfShFxDlMZjNf3lk217u1T/6QQG+vMllXSdwI3zsuC5Pg4GBOnDiBxWLB09MTi8VCVlYWwcHB2Gy2UvWVVHZ2HlarzZgODPTl5Enn3lL/RvgQidxIXPEd4OxtlpbJ5HHVH+HgwlODAwICCAsLY82aNQCsWbOGsLAw/P39S90nIiKu4ZQ9k5dffpkNGzZw6tQpBg0ahJ+fH2vXrmXixImMHTuWefPmUb16dRISEoxlStsnIiLO55QwGT9+POPHj7+ivVGjRixfvvyqy5S2T0REnE9XwIuIiMMUJiIi4jCFiYiIOExhIiIiDlOYiIiIwxQmIiLiMIWJiIg4TGEiIiIOU5iIiIjDFCYiIuIwhYmIiDhMYSIiIg5TmIiIiMNu6ictllZNXzOVXPA0NhGRikphUgqVvL3K9DGhIiLuToe5RETEYQoTERFxmMJEREQcpjARERGHKUxERMRhChMREXGYwkRERBymMBEREYcpTERExGEV4gr4rl27Yjab8fK6dIuSMWPG0LFjR3bt2kV8fDznz5+nbt26TJ8+nYCAAIDr9omIiHNVmD2T1157jeTkZJKTk+nYsSNWq5Vnn32W+Ph4UlNTiYqKYsaMGQDX7RMREeerEHsmV7Nv3z68vLyIiooCIC4ujm7dujFt2rTr9omIlJa1sJDAQN8yWdfFgvOczi0sk3W5gwoTJmPGjMFms9GqVSuefvppMjMzCQkJMfr9/f2xWq2cOXPmun1+fn6uKF9EbgAms7lsb+KqMHGupUuXEhwcTGFhIVOmTGHSpEn06NGj3LcbEFDtiray+lUiImLv98mN8L1TIcIkODgYALPZTP/+/Rk2bBgPP/wwGRkZxjw5OTmYTCb8/PwIDg6+Zl9JZGfnYbXajOnAQF9Onswtdrkb4R9eRMqfvd8n9sxXEZhMHlf9EQ4VYAD+3Llz5OZeeiNtNhvr1q0jLCyM8PBwCgoKSEtLAyApKYmYmBiA6/aJiIjzuXzPJDs7m5EjR2KxWLBarTRq1IgJEyZgMplITExkwoQJRU7/Ba7bJyIizufyMKlfvz6rVq26al/Lli1ZvXp1iftERMS5XH6YS0RE3J/CREREHKYwERERhylMRETEYQoTERFxmMJEREQcpjARERGHKUxERMRhChMREXGYwkRERBymMBEREYcpTERExGEKExERcZjCREREHKYwERERhylMRETEYQoTERFxmMJEREQc5vLH9oqI3IishYUEBvraNW9x810sOM/p3MKyKKvcKExERMqByWzmyzv7lsm62id/CBU8THSYS0REHKYwERERhylMRETEYQoTERFxmMJEREQc5tZhcvjwYe6//3569uzJ/fffz5EjR1xdkojITcmtTw2eMGEC/fv358477yQ5OZn4+HjeffddV5clIlKmSnLNSnHK65oVtw2T7Oxs9u/fz9tvvw1AbGwskydPJicnB39/f7vWYTJ52NV2NV61A+0vVuvSurQurcsBJrOZtKFPlMm6ot5YgH/BaTgAABH8SURBVCn/QunquM73o4fNZrOVtihX2rdvH88//zxr16412nr37s306dNp3ry5CysTEbn5uPWYiYiIVAxuGybBwcGcOHECi8UCgMViISsri+DgYBdXJiJy83HbMAkICCAsLIw1a9YAsGbNGsLCwuweLxERkbLjtmMmAD/99BNjx47l7NmzVK9enYSEBG699VZXlyUictNx6zAREZGKwW0Pc4mISMWhMBEREYcpTERExGEKExERcZjC5P/LyclxdQkOOXbsGCdOnHB1GSJyk7rpw2TPnj0MGzaMIUOGMGfOHH755RdXl1QiO3fu5B//+AejRo0iOzvb1eU45Ny5c0Veg9VqdWE1JZOXl0daWhq//fabq0spsby8PN555x3S0tLIz88HwJ1O8szPz+fVV19l5cqVbvWZuSwvL4+FCxeyefNmTp06BbjXZ/+ymzpMbDYbb775Jq1atWL+/PmcPHmSF1980dVl2W3SpEmMHDmS9u3b89FHH9GsWTNXl1QqeXl5vPzyy9x5551MnTqVf/3rXwCYTBX/43nixAlmzJhB3759SUtLw9PT09Ullch//vMfYmNj+fbbb1m+fDkvvPACAB4e9t3w1FVsNhtnz55l/PjxDB06lMzMTKKjo93iM/N7W7du5e677+bgwYN88cUXjBgxAnCPz/4VbDex7777zjZgwIAibTExMbYNGza4qKKSSUpKsj344IPG9Pfff2+7cOGCCysqnU8//dQ2bNgw26lTp2yHDx+2de7c2bZ69WpXl1Wsw4cP2yIiImzx8fG2U6dOubocu509e9Zms9lsFy9etL300ku2zZs322w2my03N9f217/+1fbee+/ZbDabzWq1uqzG67lc//fff28LDQ21paenu7iikvn9Z2XhwoW2JUuWGNN9+/a1zZ8/v8K+99fjOXHixImuDjRX8fPzY+bMmdxxxx3GbVjy8vL44osv6N27t4urK154eDiLFy9m3759vPnmm3z99dds27aN7OxswsPDXV3eNe3duxdPT0+qVq0KwBtvvEFYWBgdOnTAz88PHx8fVq5cye233061atVcXG1Rv6/dz8+PLVu2cOedd/LnP/+ZQ4cOkZ6eTmBgYIX8ZZ+WlkZCQgKrV68mNzeX5s2bs3TpUmrVqkVERARms5nAwEBef/11Bg4cWOFew+/rP3v2LF27dmXbtm2YTCYCAwOZO3cux48fN/5tKpodO3aQmJjIv//9bw4cOEB4eDgpKSlUqVKF1q1bAxASEsIHH3xAu3btqF69uosrLpmbOkw8PT05duwY+/bto3PnzgCEhYUxZcoU+vTpU+G+yK7GYrHw3XffMW7cOB5++GEKCgpYsGABd911F15eXq4ur4gjR47w/PPPM336dMxmM9HR0Xh4ePDzzz+zceNG7r33XgCaN2/OkiVLqFevHo0aNcJms7n8i+1atVepUoWEhAS+/vprUlJS+P7779m0aRPNmjWjRo0aFaJ2q9XKnDlzeOedd+jXrx9t2rRh48aNHDhwgOjoaL766itiYmIAaNy4MW+99VaFeu+vVf+JEye49957GTVqFPv376dp06bs3LmTTZs20bBhQ+rUqePSun9vzpw5vP/++8TExPDkk0+ybt06fvjhBzp16sQHH3zA/fffD8Cf/vQn/v3vf+Pn50ezZs0qxPtvr5s6TACqVavG//3f/xETE4OPjw9ms5nt27dTv359/vSnP7m6vGJFRETQu3dvgoKCqFSpErVr1+arr76iSZMmFe4Oynl5eZjNZu655x4+/PBDunbtio+PDwEBAcaNOi9/AWRlZbFlyxb+/ve/A64/hn+t2ps0acKOHTvo0qULL730Ei1btmTPnj188803dO/evULU7uHhgc1m4+6776Zt27Y0aNCAEydOcPbsWZo2bcquXbuoXbs2devWBeDMmTMcPHiQO+64w+W1w7XrT09Pp1+/ftSrV48xY8YQHR1NVFQUBw8eJCsri+joaFeXbqhfvz6PPPII4eHh+Pj4sH37du644w4aN27MF198ga+vr3FfwdzcXL744gtiY2MrxPtvLzcc5SlbUVFRtGjRgvj4eH7++WeSkpK4cOECLVu2dHVpdvHw8MBsNhvTycnJ1KpVi8jISBdWdXX16tWjV69e9OjRA09PT9avXw9AnTp1aNOmDQsWLDDmjYyMxNfXlwsXLlSI/1B/rD01NdXoS0xMpF+/fsClRyM0atQIb29vLl68WCFqB2jZsiWNGjWisPDS41oPHTpEQEAA7du3p0mTJixevNiYNz8/n3bt2rmq1Kv6Y/3//e9/qVGjBgB33XWX8X8gICCAkydPEhER4bJaryYkJASTyURaWhr33XcfGzduZMOGDSQnJ3P//fcza9YsY16LxUL79u1dWG3p3PRhAjB+/HgiIiJ4/vnn2bRpE8OGDTOO57uD/Px8Fi1aRN++fdm2bVuFPN592eVDhwMGDCA5OZkzZ85gNpsZOnQoe/fuZenSpXzxxRfMnTuX6OhoKleu7OKK/+f3ta9atYrTp09jtVrx9vY25tm/fz+ffPIJHTt2pFKlivNU7MtftmazmcLCQjIyMoiIiMDDw4MhQ4Zw7NgxXnrpJYYMGcKOHTto2LChawv+gz/Wn56ebowz2P7/acxff/01o0ePJi8vj0aNGrms1uupWbMmEydO5PPPP+eRRx7hjTfe4Pbbb+eWW25h7NixDBo0iFWrVlXoMc9rcuHgf4WTk5Pj6hJK5cKFC7YlS5bY9u3b5+pSSiQ2Nta2atUqY/rbb7+1zZo1y9avXz/bsmXLXFhZ8f5Y+3fffWe75557bA888IBtxYoVLqyseFu3bi1yFuO5c+ds586ds61fv96WlJTkwsrs88f6f/zxR9vKlSttsbGxtuXLl7uwMvtZLBabzXbp7K1NmzbZzp8/b9u6davto48+cnFlpadb0IvTWSwWPD09SUlJYcWKFXTp0oU9e/Ywe/ZsV5dWrD/W3rVrV3bt2sXzzz/P999/T9euXV1d4jVZrVZMJhOLFi0iNzeXtm3bMnfuXMLCwhg/fnyF3Zu97Gr1z5s3j9DQUEaOHEnNmjVdXWKJLF++nE2bNpGYmGgcsnNnFWc/XG4aly/sy8zM5JtvvqFSpUoMHDgQoMKfvfLH2j09PXnkkUcIDg6ucCc8/JHJZKKgoIBly5ZRUFDA0aNHGTVqFG3btnV1aXa5Wv0jR450m/rhf3cb+OSTT6hbty5Dhw69IYIEFCbiIlu3buXw4cN8+OGHNG/e3GivyEFy2bVqdweVKlXinnvuoVu3bm55xwR3r79atWoEBQUxefJkt/vsFEeHucQlKvoeyPW4c+0i5UVhIiIiDtOpwSIi4jCFiYiIOExhIiIiDlOYiIiIwxQmIhXYsmXLeOSRR0q0zNSpU1m2bFn5FGSH9u3bk5aWdt15jh8/zt/+9jcuXLjgpKqkvClMpMKKjIw0/jRt2pSIiAhjOiUlpdTrPX/+PKGhoRw/fvya85TmS9xRP/30k8PXTpw4cYLU1FT69u0LwB133MGmTZuM/q1btxIaGnpFW3R0tFMfFRsUFERERAQrV6502jalfClMpMLauXOn8SckJIQFCxYY03369HF1eRXSihUr6N69u3FjxKioqCJ7Cdu3b+fWW2+9oq1ly5YlflTsxYsXHaq1T58+JCUlObQOqTgUJuK2LBYLc+fOpVu3brRp04ZnnnmGs2fPArBy5Up69uzJuXPnANi4cSOdOnXi119/5cEHHwQgJiaGyMjIIr/S7XHmzBmee+452rdvT+fOnXn99deNX/XLli1j4MCBvPzyy0RFRdG9e3e++uorY9kjR44QFxdHZGQkjz76KPHx8cZz1wcMGIDFYjH2vvbv3w9cukjyWuv7o88//9y4my5A69at2b59uzGdlpbG0KFDr2i7vIzFYuG1116jS5cu3H777YwbN468vDzgf3tOH3zwAZ07d+axxx4DLt1jqkuXLrRt25Y333yzSD3ffvstd911Fy1btqR9+/bMnDnT6GvZsiUHDx7k1KlTJXj3paJSmIjbWrx4MV9++SXvv/8+n332GZUrV2batGkA3H333TRp0oRXXnmFU6dOMWHCBKZNm0aNGjVYunQpAOvXr2fnzp3GQ6zsNWbMGHx9fdm0aZNxs77k5GSjPy0tjfDwcLZt28aAAQMYP348cCkU/vGPf9CmTRu2bdvGY489VuRw3ZIlS/D09DT2vi4f8rrW+q7mwIED3HLLLcZ0VFQU33//Pfn5+Vy4cIGDBw8SGxvL8ePHjbbdu3cbYZKUlMTHH3/M0qVL2bBhAzk5ObzyyivG+iwWC7t372b9+vXMmzeP/fv3M3XqVF599VU+++wz0tPTOX36tDH/5MmTefzxx9mxYwepqalF3msvLy/q1q3LDz/8UKL3XyomhYm4raSkJJ555hnq1KmDl5cXw4cPZ926dcbzLSZNmsTmzZt55JFH6N27d5k8cCg9PZ20tDTGjh1LlSpVqF27Ng899BBr16415rnlllu466678PT05K677iI9PZ2zZ89y5MgR/vvf/zJ8+HDMZjNt2rShU6dOxW7zWuv7I4vFwrlz5/Dx8THabr31VmrWrMmOHTvYt28foaGhmM1m/vKXvxhtJpPJCK7Vq1fz6KOPUrduXapVq8bo0aNZvXo1v79RxqhRo6hSpQre3t6sX7+enj17EhkZidls5umnny4y9lKpUiWOHDnC6dOnqVatGn/5y1+K1Ozj43PV1yLuRzd6FLdks9k4fvw4jz32WJH7ZFmtVk6fPo2/vz81a9akR48eLFu2jIULF5bJdjMyMjh//nyRJxFarVYaNGhgTNeqVcv4e5UqVQA4d+4cWVlZ+Pv7F3kyZlBQEPn5+dfd5rXWV7169SLzeXp64uPjc8X6Lo+b+Pj4EBUVBUCrVq2MtpYtWxoP8srKyjIe3wuXnhBYUFDAmTNngEt37v39s9WzsrIICgoypn19ffH19TWmExISmDNnDjExMfzpT39i1KhRdOzY0ejPz8+/4nWIe1KYiFvy8PCgTp06zJkz55pPpduzZw9r164lJiaGl19+mfnz5xvLllZQUBBVq1Zl+/btJV5PYGAgOTk5FBYWGoFy/Phx48u3LG4eGRoaypEjRwgNDTXaWrduzbp16/D19TXOUIuKiiIhIQFfX18jYABq165Nenq6MZ2RkYG3tzd+fn7k5ORcUWNgYGCRs+Jyc3PJzc01phs1asTs2bOxWCysXbuWkSNH8s0332A2mzl//jzp6ek0bdrU4dctrqfDXOK24uLimDlzJpmZmQBkZ2ezefNmAH777TeeffZZnn/+eV555RUOHz7MihUrgEuPfvX19eWXX3657vqtVivnz583/hQWFlK/fn1atGhBYmIieXl5WK1Wjhw5Uux1FXDpcNUtt9zCvHnzuHDhAtu3b+fzzz83+gMCArBYLGRkZJT2LaFz585FBtfhUnDs3buX3bt3G4eZmjVrxqFDh/j222+LDNjHxsby1ltvkZGRQV5eHrNnzyY2NvaaQderVy82bNjA7t27KSwsZPbs2UXOCrv8eGNPT098fX3x8PAw1rVjxw5uu+22Inte4r4UJuK2hgwZQrt27Rg4cCCRkZHExcUZZ0C98sorNGrUiL59++Lt7U1iYiKJiYkcO3YMuHTc/6mnniIqKopPPvnkquvftm0bERERxp8WLVoAMHPmTHJzc+nVqxfR0dGMHj2a7OzsYuv18PDg1VdfNa7rWLBgATExMcZeSo0aNRgyZAj33HOPMXBeUnfffTebNm2isLDQaAsNDaVq1arUq1ePqlWrAlC5cmWaNm1KYWEhERERxrwPPPAAPXr0IC4ujh49euDn58e4ceOuub1mzZoxduxYRo0aRadOnQgODi7yxMP//Oc/xllzs2bNYvbs2VSuXBm4ND4TFxdX4tcoFZNuQS/iQsOGDaNFixY8/vjjZbbOadOm0bBhQx544IEyW2dZO3HiBIMHD2bVqlVGuIh7U5iIONHu3bsJCAggJCSELVu2MGrUKFatWkXjxo1dXZqIQzQAL+JEx48fZ+TIkZw9e5agoCCmTp2qIJEbgvZMRETEYRqAFxERhylMRETEYQoTERFxmMJEREQcpjARERGHKUxERMRh/w9qysZdRNjGbwAAAABJRU5ErkJggg==\n"},"metadata":{},"output_type":"display_data"}]},{"cell_type":"markdown","source":"Tweets are definitely more likely to contain fewer words due to the character restriction. It is easy to see how the text length begins to drop after 7 words","metadata":{"tags":[],"cell_id":"c2c971d4-d9af-4222-998e-6ed89ac16438"}},{"cell_type":"markdown","source":"As long as we are analyzing text length, why dont we look at text length vs selected text length (chars)","metadata":{"tags":[],"cell_id":"c7e55223-13c2-4b36-998d-ec531ef05cf7"}},{"cell_type":"code","metadata":{"tags":[],"cell_id":"d11080be-51d9-4875-8a3a-33e261e06898"},"source":"cat_sentiment = {\"positive\":1,\"neutral\":0,\"negative\":-1}\r\ndf['cat_sentiment'] = df.sentiment.apply(lambda x: cat_sentiment[x])\r\ndf['cat_sentiment']","outputs":[{"output_type":"execute_result","execution_count":17,"data":{"text/plain":"0        0\n1       -1\n2       -1\n3       -1\n4       -1\n        ..\n27476   -1\n27477   -1\n27478    1\n27479    1\n27480    0\nName: cat_sentiment, Length: 27481, dtype: int64"},"metadata":{}}]},{"cell_type":"markdown","source":"Does tweet length or number of words have any correlation with sentiment? If it does that would be valuable to introduce as a feature and could contribute to better model performance","metadata":{"tags":[],"cell_id":"1abe0532-8e09-4d5a-87fa-8e190c7cc519"}},{"cell_type":"code","metadata":{"tags":[],"cell_id":"033aa8dc-b4c5-4d39-b9cb-b145ab22468b"},"source":"corr_matrix = df.corr()\r\ncorr_matrix","outputs":[{"output_type":"execute_result","execution_count":18,"data":{"application/vnd.deepnote.dataframe+json":{"variableDetails":{"dataframe":{"text_len":{"text_len":1,"num_words":0.961234127294361,"cat_sentiment":0.001506637044493506},"num_words":{"text_len":0.961234127294361,"num_words":1,"cat_sentiment":-0.01812998737575555},"cat_sentiment":{"text_len":0.001506637044493506,"num_words":-0.01812998737575555,"cat_sentiment":1}},"columns":[{"name":"text_len","stats":{"count":3,"mean":0.6542469214462848,"std":0.5656218766829622,"min":0.001506637044493506,"25%":0.48137038216942724,"50%":0.961234127294361,"75%":0.9806170636471805,"max":1,"nan_count":0}},{"name":"num_words","stats":{"count":3,"mean":0.6477013799728685,"std":0.5769525591681357,"min":-0.01812998737575555,"25%":0.4715520699593027,"50%":0.961234127294361,"75%":0.9806170636471805,"max":1,"nan_count":0}},{"name":"cat_sentiment","stats":{"count":3,"mean":0.32779221655624596,"std":0.5822318072266657,"min":-0.01812998737575555,"25%":-0.00831167516563102,"50%":0.001506637044493506,"75%":0.5007533185222468,"max":1,"nan_count":0}}],"frequencyInfo":[{"frequencyData":[{"x":0.001506637044493506,"y":1},{"x":0.10135597334004416,"y":0},{"x":0.2012053096355948,"y":0},{"x":0.30105464593114545,"y":0},{"x":0.4009039822266961,"y":0},{"x":0.5007533185222468,"y":0},{"x":0.6006026548177974,"y":0},{"x":0.7004519911133481,"y":0},{"x":0.8003013274088987,"y":0},{"x":0.9001506637044493,"y":2}],"type":"hist"},{"frequencyData":[{"x":-0.01812998737575555,"y":1},{"x":0.08368301136181999,"y":0},{"x":0.18549601009939554,"y":0},{"x":0.2873090088369711,"y":0},{"x":0.3891220075745466,"y":0},{"x":0.49093500631212217,"y":0},{"x":0.5927480050496977,"y":0},{"x":0.6945610037872733,"y":0},{"x":0.7963740025248488,"y":0},{"x":0.8981870012624243,"y":2}],"type":"hist"},{"frequencyData":[{"x":-0.01812998737575555,"y":2},{"x":0.08368301136181999,"y":0},{"x":0.18549601009939554,"y":0},{"x":0.2873090088369711,"y":0},{"x":0.3891220075745466,"y":0},{"x":0.49093500631212217,"y":0},{"x":0.5927480050496977,"y":0},{"x":0.6945610037872733,"y":0},{"x":0.7963740025248488,"y":0},{"x":0.8981870012624243,"y":1}],"type":"hist"}]},"numElements":3,"numColumns":3},"text/plain":"               text_len  num_words  cat_sentiment\ntext_len       1.000000   0.961234       0.001507\nnum_words      0.961234   1.000000      -0.018130\ncat_sentiment  0.001507  -0.018130       1.000000","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>text_len</th>\n      <th>num_words</th>\n      <th>cat_sentiment</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>text_len</th>\n      <td>1.000000</td>\n      <td>0.961234</td>\n      <td>0.001507</td>\n    </tr>\n    <tr>\n      <th>num_words</th>\n      <td>0.961234</td>\n      <td>1.000000</td>\n      <td>-0.018130</td>\n    </tr>\n    <tr>\n      <th>cat_sentiment</th>\n      <td>0.001507</td>\n      <td>-0.018130</td>\n      <td>1.000000</td>\n    </tr>\n  </tbody>\n</table>\n</div>"},"metadata":{}}]},{"cell_type":"markdown","source":"Unfortunately as expected there is no strong correlation between word length adn sentiment, and expectedly there is a strong correlation between number of words and text length as the more words you have the longer the sentence would be","metadata":{"tags":[],"cell_id":"817c24ee-2bb1-424d-bb9a-65ee7f50d5a5"}},{"cell_type":"markdown","source":"Lets now take a look at how many unique words we are dealing with in the corpus","metadata":{"tags":[],"cell_id":"3b37828f-45e3-4ebe-98ec-f2ba4c860acb"}},{"cell_type":"code","metadata":{"tags":[],"cell_id":"b71a1305-2765-424b-b02c-4438c15683e9"},"source":"results = set()\r\ndf.text.apply(lambda row : results.update(str(row).lower().split()))\r\nlen(results)","outputs":[{"output_type":"execute_result","execution_count":19,"data":{"text/plain":"45433"},"metadata":{}}]},{"cell_type":"markdown","source":"There are 45433 unique words in this dataset. This will be useful information during the tokenization step","metadata":{"tags":[],"cell_id":"ced4f37e-f4dc-4db3-94e2-1f500d17d693"}},{"cell_type":"code","metadata":{"tags":[],"cell_id":"74f4cafa-c4c9-4416-8450-e402368138e8"},"source":"results","outputs":[{"output_type":"execute_result","execution_count":20,"data":{"text/plain":"{'until',\n 'tritonlink',\n 'dent',\n 'glass',\n 'exam.',\n 'full',\n 'ï¿½-net',\n 'greetings!!',\n 'pearlyn`s',\n 'pub.',\n \"'good'\",\n 'beck',\n '^__^',\n 'beyonce',\n 'collegue.',\n 'goody',\n 'guess...',\n 'forbiddenyou',\n 'cloggin',\n 'mommas',\n 'co-worker',\n 'premiers',\n 'rihanna',\n 'chiquita',\n 'twinz!',\n 'nephew',\n 'pick',\n 'lovely,',\n 'dogs!!!',\n 'blueberry',\n 'whoot',\n 'vera!!',\n 'rehearsal',\n 'michelle',\n 'nonsense',\n 'dfizzy',\n 'mobiles',\n 'http://tinyurl.com/ry9wap',\n 'wavy',\n 'badges',\n 'hardcore',\n 'operation',\n \"'tru\",\n 'synced',\n 'rio',\n 'everything!',\n 'carrier',\n 'festival?',\n 'withdrawal',\n 'yup...as',\n 'policy',\n 'return.',\n '_elmo_****',\n 'forgotten',\n 'row..wishing',\n 'knoww...',\n 'clutter',\n 'identity',\n 'counts',\n 'kaare',\n '_rankin',\n 'mine,',\n 'same...why',\n 'couch.',\n 'suspended,',\n 'hahaaaha',\n 'raha',\n 'http://tinyurl.com/64ozr7',\n 'michaels',\n '_j_stuart',\n 'ring',\n 'dne',\n 'sunny!',\n 'talinda',\n 'place!!!!!!!!!!!',\n 'birmingham',\n 'hi...!!!',\n 'yum,',\n 'wrist',\n 'valuable',\n 'fof',\n 'otara!',\n 'diary',\n 'cups,',\n 'hurting,',\n 'everrrr.',\n 'throooooooooooooo!!!!!!',\n 'blackjack',\n 'eunice',\n 'lotion',\n 'lurvee',\n '`no`',\n 'mom.',\n 'thru...',\n 'free!)',\n 'iplayer',\n 'early??',\n 'story?!',\n 'do??',\n 'eye',\n 'ca...',\n 'lives.',\n 'whoaa',\n 'okc',\n 'tiredd',\n 'dixon',\n 'hammock...no',\n '9followers',\n 'nosebleed.',\n 'tigerheat',\n 'northland!',\n 'http://loopt.us/f8_jqg.t',\n 'steam',\n 'unemployed',\n 'impatient,',\n 'bonkers?',\n '****!!!!!!',\n 'wife,',\n 'comix',\n 'gladiators',\n 'lolly',\n 'macedonia:',\n 'cell!',\n '$160',\n '_sara',\n 'achy',\n 'vhits',\n 'ashley!',\n 'artesia',\n 'a#@',\n 'reports',\n 'innn',\n 'bajan',\n 'evenn',\n 'nostalgic.',\n 'sempit',\n 'were,',\n 'whiny',\n 'appartment',\n 'queens.',\n 'outdoor',\n 'tax',\n 'things!',\n '_ryan',\n 'http://bit.ly/9vbzg',\n 'toll..',\n 'hehaheahaaaa',\n 'sleeping!!',\n 'euggh',\n 'zul`jin',\n 'socks',\n 'mann.',\n 'riot..now',\n 'beer!!',\n 'patronum!!!',\n 'u....',\n 'follow!!',\n 'holllaaa..',\n \"ova!'\",\n 'bruised.',\n 'toilet',\n 'france',\n 'guessed',\n '330',\n 'degr',\n 'thats',\n 'parsley-',\n 'foot...so',\n 'weather,',\n 'ericsson',\n '(action)',\n 'cup*',\n 'bode',\n 'promises',\n 'calling.',\n 'profy!,.just',\n 'snowball',\n 'arthritus',\n '5.30am',\n 'raffle?',\n 'ny.',\n 'anna.',\n \"location',\",\n 'empowering,',\n 'cutee!!',\n 'blankie',\n 'exposed',\n 'cave,',\n 'blocc',\n 'thatï¿½s',\n 'neighboures.',\n 'jon4lakers',\n 'darn,',\n 'www.myspace.com/destroyalllines',\n '182`s',\n 'bledel',\n 'hydro',\n 'charlene',\n 'quarters.',\n 'mjb',\n 'dressing',\n '8pm.',\n 'musicans',\n 'lurveeeeee',\n 'thought....',\n 'http://bit.ly/qxxhi',\n 'screenshots',\n 'food!!',\n 'pai',\n 'w/me!',\n 'non-resident',\n 'alwas',\n 'bart:',\n 'butï¿½nadaï¿½',\n 'rude,',\n 'left',\n 'code',\n 'tht',\n 'http://twitpic.com/3bnas',\n 'share',\n 'boring...',\n 'angus',\n '`today',\n 'pun`k',\n 'flying',\n '*sniff',\n '`em!!!',\n 'english!!!!',\n 'noboby`s',\n 'ghosts',\n 'reallyy',\n 'hiring!',\n 'describes',\n 'visitng',\n 'coool',\n 'fgs',\n 'digest`',\n 'breakingg',\n 'dynamicproxy',\n 'fry',\n 'broadband',\n 'http://twitpic.com/4jkes',\n 'hayes.',\n 'bhaaji',\n 'cool!!!,',\n 'machine.',\n 'allmothers,',\n 'daylight',\n 'stapler...',\n 'iloveyoutoo.',\n 'tweedles',\n 'hawaii!!',\n 'felton.',\n 'http://vzerohost.com/info',\n 'vietnam,',\n 'find...trust',\n '5-4-09',\n 'around!',\n 'perfect.',\n 'enough',\n 'awoke',\n 'cb',\n 'http://www.ted.me/',\n 'not!',\n 'hanafiah',\n 'tee`s',\n 'relished',\n 'lovingly',\n 'back...been',\n 'lm',\n 'stuf...ppl`s',\n 'erm,',\n 'inhabit',\n 'state....',\n 'seen',\n 'tonite',\n 'crack-up',\n 'mom-the',\n 'dancing!!',\n 'tweeters,',\n 'great!!',\n 'poltergeist!',\n 'downed.',\n 'days-off',\n 'here:',\n '@_shutupandsmile',\n 'athens!',\n 'lives...i',\n 'laura!',\n 'prize?and',\n 'bar.',\n 'disappearing',\n '(fruitbat)?',\n 'what.',\n 'anyway...',\n 'arts',\n 'http://is.gd/wyyp',\n 'eppy!',\n 'mccafe=tastes',\n 'engulfed',\n 'weather..',\n '18hrs',\n '58',\n 'sucks!!!!',\n 'hiya',\n 'grading.',\n 'guide',\n 'tom.',\n 'embraced',\n 'machine)?',\n 'sophomore',\n 'sparkle',\n 'natalie',\n 'everyone-',\n 'footballers',\n 'databook.',\n 'garland',\n 'widget.',\n 'independent',\n 'mission',\n 'http://ustre.am/2w5v)',\n 'me.....',\n '8d',\n 'greentea.',\n 'corrections',\n 'babysitter,',\n 'comfy',\n 'lol..well..no.',\n 'crib..bout',\n 'web/graphic',\n 'martwo:',\n 'pain',\n 'playback',\n 'erin',\n 'yosemite.',\n 'buxton',\n 'thy',\n 'molar',\n '13',\n 'stanley',\n 'move...but',\n ':hard',\n 'daddy-size.',\n 'test',\n 'audi',\n '_hay',\n 'compromise',\n 'oxox',\n 'bummer!',\n 'users.',\n '*burst',\n 'labels',\n 'london?!',\n '@_erica',\n 'bride`s',\n 'spayed...',\n 'friday!',\n 'statue',\n 'us/can',\n 'jr',\n 'bullying',\n 'chal',\n '@_chellebelle_',\n 'tucson,',\n 'control',\n 'changing',\n 'flies??',\n '_26',\n 'rest',\n 'toro',\n 'texts????',\n 'phil!!!',\n 'rumored',\n '4-year-old!',\n '#britainsgottalent',\n 'lsat',\n 'maria.',\n 'onion',\n 'waiting...',\n 'dee....',\n 'library.',\n 'vacation,',\n 'http://twitpic.com/66vlw',\n 'bad...',\n 'nz',\n 'wooop!',\n 'latte',\n 'sweeeet!',\n 'course',\n 'death',\n 'stalkerishly',\n 'stupid.',\n 'tpc',\n 'clinch',\n 'scarf',\n 'schoolers',\n 'brandon.',\n 'http://twitpic.com/4jeij',\n '@_emily_young_',\n 'spike(car)',\n 'lauren',\n 'leeds',\n 'twitterworld?',\n \"song...'peter\",\n 'maddest',\n 'reyes',\n 'http://plurk.com/p/wyb4h',\n 'out!!!!',\n 'hand.',\n 'awards!!',\n 'cheat',\n '_bones',\n 'hols?',\n 'penance',\n '50p',\n 'good-byes',\n 'out:',\n \"'mom,\",\n 'uswitch.net',\n '167',\n 'people?',\n 'updates.they',\n 'letterman?',\n 'rieger',\n 'see',\n 'firemen',\n 'tweet!!!',\n 'hi,',\n 'clutters',\n 'u2',\n 'bits.',\n '`s`',\n 'bouquets',\n 'mums,',\n 'please,',\n 'fields!!',\n 'computer,',\n 'saget',\n \"'jaunty\",\n 'binstruct',\n 'dum',\n 'vcr',\n 'lonely',\n 'strategy.',\n 'laura`s',\n 'mccoy',\n '2:30.',\n 'apnea,',\n 'channel',\n 'bgeezy,',\n 'picture!',\n 'skills.',\n 'docent',\n 'spanish...any',\n 'seuss',\n 'frees',\n 'sweetie....tweethug',\n 'http://plurk.com/p/svxe1',\n 'beach!!',\n 'is..',\n 'wooohooo!',\n 'prematurely!',\n 'sanfran',\n '_hope',\n 'gpt',\n 'keswick',\n 'syncs',\n 'mad;',\n 'twittter',\n 'assistant/cook/nanny/chauffer',\n 'kelly<3',\n 'sigh.....',\n 'niamh',\n 'holds',\n 'working?',\n 'crazy,',\n 'or...reply',\n 'present,',\n 'javascript',\n 'words.',\n 'jobs?',\n 'yay!!!1',\n 'ladyhawke',\n 'order!',\n 'iop',\n 'days...but',\n 'sissy!!!!!!!',\n 'upkeep.',\n '3,144',\n 'circumstances,',\n 'crystal',\n 'vega:',\n '(aka',\n 'jean',\n 'attempting',\n 'awwh',\n 'satisfied.',\n 'room',\n 'fakes',\n 'rains,',\n 'somewhere',\n 'keren',\n 'exams..',\n 'rhythms',\n '(2nd',\n '#nightshift',\n 'smushed',\n '_longman',\n 'distraaaaacting',\n 'supernatural!',\n 'same!!',\n 'http://twurl.nl/8q6cjc',\n 'phillies',\n 'jewelry.',\n 'friench',\n 'pikachu',\n '`unfollow`',\n 'honest.',\n \"say'bitch!'\",\n 'forza',\n 'asylum',\n 'coffee.',\n 'reznor',\n 'confiscate',\n 'rr',\n 'take-home',\n 'michelleeeeeeeeeeeemybelleeeeeeeeeeeeeeeeee',\n '`don`t',\n 'norway.',\n 'now..hmm',\n 'sleeeeepy!!!',\n ':*',\n 'rugby,',\n 'coolmax',\n 'rosary',\n 'bummed...you`re',\n 'pitts.',\n 'through..she',\n 'galaxy,',\n 'air..',\n 'euphoria',\n 'euh,',\n '#rrtheatre',\n 'terms',\n 'b`day!!',\n 'oil.',\n 'yaaayyy',\n '(chinese)',\n 'gosh,',\n 'http://twitpic.com/4t6qx',\n 'q13',\n '(story-wise)',\n \"'racism\",\n 'asprin',\n '#****',\n 'healthy,',\n '10mm',\n 'reallizing',\n 'plan...ran',\n 'parents.....that`s',\n 'chap',\n 'packing,',\n 'eggs,',\n 'obscurity.',\n 'nearby,',\n '4x4s',\n 'bridge',\n 'nurse.',\n 'xxxxxxx',\n 'posterrr!',\n 'pt/12am',\n 'fainting.',\n 'dogs.',\n 'yall',\n 'fallower',\n 'explanation?,',\n '-_-',\n 'travel...',\n 'soon!have',\n '5000',\n 'new..',\n 'dem',\n 'http://tinyurl.com/dl2upx',\n 'taco`s.',\n 'tmw',\n '30',\n 'non-www',\n 'prof`s',\n 'stress,',\n 'feelers',\n '1300',\n 'workout.',\n 'http://bit.ly/7vikc',\n 'haff',\n 'hurrr',\n 'disk-happiness',\n 'lol?',\n 'past.',\n 'compatible.',\n ')))))',\n 'gordon',\n 'spendin',\n '`transformers',\n 'thompson',\n 'service..will',\n 'illy`s',\n 'miranda',\n 'watched',\n 'ff!',\n '(25k/yr)',\n 'beeman`s',\n 'moss',\n 'http://tinyurl.com/oqsqz6',\n 'far?',\n 'ask.',\n 'wrk',\n '(blame',\n 'male',\n '_333',\n 'spring!',\n 'really..',\n 'ber-tweeter',\n 'virgins',\n 'trafficking',\n 'excedrine?',\n 'trailer!',\n 'disorganized',\n 'job...i',\n 'g-town',\n 'miz!!!',\n 'minh',\n 'vocab?',\n 'work!',\n 'tax-man',\n 'kmart',\n 'baka',\n 'lactose',\n '...so',\n 'flu!',\n 'http://tinyurl.com/mtfye3',\n 'vegetables',\n 'ambiguous)',\n 'maaaaaan',\n 'homes.',\n 'theatre....',\n 'mountains,',\n 'unbearable..&',\n 'acuerdo',\n 'hehe..',\n 'http://twitpic.com/4j6kc',\n 'capacity',\n 'mild',\n 'broken.',\n 'lucien',\n 'derbyshire.',\n 'bolshoi?',\n 'revisingg',\n 'diffusing',\n 'sorted.',\n 'spamspamspam.',\n 'nap.',\n 'unassuming',\n 'ken',\n 'somebody',\n 'tonight...fox',\n 'currently',\n 'drinking',\n 'be?',\n 'cowbridge',\n 'editting',\n 'http://twitpic.com/4w8l1',\n 'kiddshow',\n 'burton',\n 'high...missing',\n 'gays',\n 'bberry',\n 'declaring',\n 'moves!)',\n \"space'\",\n 'nosebleeds,',\n 'bobobyebye!!',\n 'tinfoil',\n '_eyes',\n 'paycheck',\n 'passion',\n 'bladder,',\n '..stupid',\n 'preciate',\n 'friday.....i',\n 'yes',\n 'this...wash',\n 'belly.',\n 'kid-sitting',\n 'nightss.',\n '_c',\n 'divorcing',\n 'hej',\n '_you',\n 'mii',\n 'favs',\n 'sewing.',\n 'ability,',\n 'sucks.',\n 'ford',\n 'hoped',\n 'air!',\n 'fabulously40',\n 'alochol',\n 'allianz',\n '2night!',\n \"...'\",\n 'whine',\n 'justwatched',\n 'price.',\n 'life.love.stress',\n '*wabble',\n 'sayin`',\n 'mï¿½',\n \"'what?'...the\",\n 'arsenal',\n '#starwars',\n 'thing..not',\n 'http://mobypicture.com/?ee2ij3',\n 'speeds',\n 'shortstack.',\n 'chemist',\n 'loongerrr!',\n 'frisbee',\n 'sleeeeep.',\n '(raises',\n 'idea,',\n 'danstorce.',\n 'joining',\n 'nap...',\n 'uritors.',\n 'cm!',\n 'fam!',\n 'returns!',\n 'days...the',\n 'day..?!',\n '_union',\n \"'problem'\",\n 'donbt',\n 'video:',\n 'itching.',\n 'glyders.',\n 'vote!',\n 'duas!',\n 'http://bit.ly/kn3mp',\n '10.5%',\n 'summit?',\n 'ehhh',\n 'seeing,',\n 'core.',\n 'it???',\n 'guy...',\n 'congratulatory',\n 'haiszt..',\n 'repeatedly,',\n 'henning`s',\n 'amaaazing!',\n 'hilarious!!!!',\n 'khush',\n 'wiggity',\n 'toniight',\n 'parody.',\n 'brought',\n 'other,',\n 'admire',\n 'sakatas',\n 'personality!!!!',\n 'trippin`',\n 'tonyt',\n 'way.',\n 'pros',\n '_burnett',\n 'sp2',\n 'casablanca.',\n 'yeaah',\n 'nans',\n 'loveeeeeeee',\n 'haircuts?',\n 'mybrute!',\n 'sweat.',\n 'ians',\n 'coffee!!!!!',\n 't-9',\n 'wednesday?',\n 'finely',\n 'formulate',\n '_earedpages',\n 'underground',\n 'beddy',\n 'damm',\n 'stil...',\n 'tirith',\n 'sufferin',\n 'while`',\n 'kettle',\n 'sit',\n 'cambridge',\n 'nuuuuu,',\n 'www.audiomicro.com',\n 'julyish???',\n 'pictures!',\n 'faceeee',\n 'http://plurk.com/p/sujth',\n 'aumfff',\n 'kesian',\n 'ignore',\n 'eta',\n '_deen',\n 'kiddnation',\n 'livestream.',\n 'bugger.',\n 'idea...',\n 'in2',\n 'important,',\n 'landlines.',\n 'friggin',\n 'mumm',\n \"one'...those\",\n 'threw',\n 'pcvs',\n 'vma`s?',\n 'dive..a',\n 'whatta',\n 'funny...u',\n 'swarm,',\n 'is',\n '8330',\n 'harris.',\n 'here...',\n 'hint!',\n 'p4',\n 'photo,',\n '2006',\n '12:45pm',\n 'satisfy',\n '*i',\n 'exam!!',\n 'coffee)',\n '1986.',\n 'would`ve',\n 'fiery',\n 'live360',\n 'anniversary.',\n \"feeling.'\",\n 'grandmother',\n 'couldn`t',\n 'hahahha',\n 'wooooo!!',\n 'fab.',\n 'you-brokeback',\n 'stare:',\n 'sarah.',\n 'while',\n 'relaxation..',\n 'umm.',\n '0`',\n '*huggles*',\n 'ago?',\n 'teleport',\n 'flickr.,',\n 'rusks',\n 'mode:',\n 'revision.',\n 'gandhi',\n 'promo...',\n '2moro!',\n 'livi',\n \"'ground\",\n '7.5',\n 'charmer',\n 'playah!',\n 'git',\n 'discovered',\n 'succumb',\n 'lol...thats',\n 'oprah,',\n 'guessing,',\n 'adrenaline',\n '3-1',\n 'recourse',\n '(yes',\n 'lately!',\n 'showwwww',\n 'bookmarked',\n 'changedd',\n 'bobby`s',\n 'hehe...nice',\n 'hur',\n 'coughed',\n 'outlook.',\n 'morgan`s',\n 'midday',\n 'late..',\n \"'where\",\n 'nice...',\n 'title',\n 'pj`s,',\n 'gimp',\n 'sequester...hope',\n 'cycled',\n 'awww!',\n 'library',\n '_gyrl',\n 'fabric',\n 'thay',\n 'juga',\n 'woohoo!!',\n '_north27',\n 'yumm',\n 'claus....such',\n 'sadifying',\n '11',\n 'yikes!',\n 'sir.',\n 'soz',\n \"'milk'...\",\n 'glory',\n 'trackball',\n 'zap!',\n 'glasgow',\n 'crew?',\n 'tshirts',\n 'allen',\n 'stockholm',\n 'ragazzi!!!',\n 'amazing!',\n 'swear.',\n '****--',\n 'kaka-tweak',\n 'unfort,',\n 'eleven..',\n 'netbook',\n \"'yummy'\",\n 'correctly!',\n 'remembering',\n 'ahh',\n '(good',\n '#costsavings',\n 'abt',\n 'on??!',\n 'familiar.....sorry',\n 'typed,but',\n 'jokes',\n '19...',\n 'biking',\n 'haha...it',\n 'native.',\n 'reduce',\n 'osap',\n 'anything....but',\n 'mocha',\n 'want???',\n 'guinness',\n 'yeah....',\n 'nice,',\n 'achievement',\n 'hon!',\n 'ma`a',\n 'smackdown/ecw',\n 'cheesy',\n 'furious',\n 'startup.',\n 'cuzs',\n 'viper',\n 'drawls..',\n 'bubbled',\n 'snowdaysss..',\n 'minted',\n 'florida..not',\n 'break-up',\n 'issue.',\n 'spoon',\n 'drift',\n 'posts?',\n 'tickets!!',\n 'blamed.',\n \"sorry'\",\n 'quote,',\n 'scripts.',\n '_roe',\n 'rachel.',\n 'supernatural',\n 'music(both',\n '(exausted)',\n 'book-still',\n 'hr.',\n 'baptist',\n 'cont...and',\n '_stack',\n 'girl!',\n 'minibar...',\n 'war?',\n 'music?',\n 'games',\n 'gasp]',\n 'cs',\n ...}"},"metadata":{}}]},{"cell_type":"markdown","source":"Just by looking at the data there are some mispellings and numbers being included as words. the true word count should be less than this. spellchecking and removing  non alphabetical values should give a more accurate picture of how many words we are working with.","metadata":{"tags":[],"cell_id":"059a9cb3-70d8-4ea1-936e-fb71cf4fb519"}},{"cell_type":"code","metadata":{"tags":[],"cell_id":"fe942395-b0dc-4c49-9550-c5978a788055"},"source":"from sklearn.feature_extraction.text import CountVectorizer\r\nimport re\r\nvectorizer = CountVectorizer(stop_words='english')\r\ndf[\"text\"]=df.text.apply(lambda x: re.sub(\"[^a-zA-Z' ]\",' ', str(x)))\r\ndf=df[df[\"text\"].notna()]\r\nX_train = vectorizer.fit_transform(df[\"text\"])\r\nlen(vectorizer.get_feature_names())","outputs":[{"output_type":"execute_result","execution_count":21,"data":{"text/plain":"24304"},"metadata":{}}]},{"cell_type":"markdown","source":"Above we remove all the non alphabetical value and null values. the countvectorizer counts the unique features by one hot encoding them in the text and then we output the length result","metadata":{"tags":[],"cell_id":"dec39dba-cca8-4921-a2fb-cbe1daa21c30"}},{"cell_type":"code","metadata":{"tags":[],"cell_id":"2de76b80-bad3-4300-a6a8-e95b9dea3e12"},"source":"vectorizer.get_feature_names()","outputs":[{"output_type":"execute_result","execution_count":22,"data":{"text/plain":"['aa',\n 'aaa',\n 'aaaa',\n 'aaaaaaaaaaa',\n 'aaaaaaaaaahhhhhhhh',\n 'aaaaaaaaaamazing',\n 'aaaaaaaafternoon',\n 'aaaaaaaahhhhhhhh',\n 'aaaaaah',\n 'aaaaaahhhhhhhh',\n 'aaaaaawwwesome',\n 'aaaaahhhh',\n 'aaaaall',\n 'aaaaand',\n 'aaaaaoouoouoouu',\n 'aaaaarrrrggghhh',\n 'aaaaaw',\n 'aaaaawhh',\n 'aaaaawwwwwww',\n 'aaaagggessss',\n 'aaaagh',\n 'aaaah',\n 'aaaahh',\n 'aaaaiieee',\n 'aaaand',\n 'aaaargh',\n 'aaaau',\n 'aaaaw',\n 'aaaawww',\n 'aaah',\n 'aaahaha',\n 'aaahh',\n 'aaahhh',\n 'aaand',\n 'aaargh',\n 'aaarrrgggghhh',\n 'aaarrrggghhh',\n 'aaarrrgh',\n 'aaauuuggghhh',\n 'aaaw',\n 'aaaww',\n 'aaawww',\n 'aac',\n 'aaggh',\n 'aah',\n 'aahhh',\n 'aahs',\n 'aam',\n 'aannndd',\n 'aapko',\n 'aargh',\n 'aaron',\n 'aarrgghh',\n 'aasman',\n 'aawww',\n 'ab',\n 'aba',\n 'ababa',\n 'abalone',\n 'abandoned',\n 'abandoning',\n 'abandonment',\n 'abang',\n 'abba',\n 'abbey',\n 'abbie',\n 'abbreviate',\n 'abbreviation',\n 'abbster',\n 'abby',\n 'abc',\n 'abducted',\n 'abe',\n 'abean',\n 'aber',\n 'aberdeen',\n 'abhi',\n 'abhor',\n 'abi',\n 'abiding',\n 'abilities',\n 'ability',\n 'abit',\n 'able',\n 'ableton',\n 'abnormal',\n 'aboard',\n 'abouts',\n 'abouttttto',\n 'abp',\n 'abrams',\n 'abrjp',\n 'abroad',\n 'abs',\n 'absence',\n 'absent',\n 'absolute',\n 'absolutely',\n 'absolutley',\n 'absolutly',\n 'absolves',\n 'absoulutley',\n 'abstraction',\n 'abt',\n 'abuelo',\n 'abueltia',\n 'abundant',\n 'abuse',\n 'abusive',\n 'abuzz',\n 'aby',\n 'ac',\n 'acaban',\n 'academic',\n 'academically',\n 'academy',\n 'acara',\n 'acc',\n 'accadentally',\n 'accdentt',\n 'accel',\n 'accela',\n 'accelerated',\n 'accent',\n 'accept',\n 'acceptable',\n 'acceptance',\n 'accepted',\n 'accepting',\n 'accepts',\n 'access',\n 'accessibility',\n 'accessible',\n 'accessing',\n 'accessories',\n 'accident',\n 'accidentally',\n 'accidentaly',\n 'accidentely',\n 'accidently',\n 'accidents',\n 'accompanied',\n 'accompany',\n 'accomplish',\n 'accomplished',\n 'according',\n 'accordion',\n 'account',\n 'accountant',\n 'accounting',\n 'accounts',\n 'acct',\n 'accumulating',\n 'accusations',\n 'accused',\n 'ace',\n 'aced',\n 'acee',\n 'acen',\n 'aceness',\n 'aces',\n 'acess',\n 'ach',\n 'achan',\n 'ache',\n 'aches',\n 'achieve',\n 'achievement',\n 'achieving',\n 'achillies',\n 'aching',\n 'achurley',\n 'achy',\n 'acid',\n 'acing',\n 'ack',\n 'ackles',\n 'acknowledge',\n 'acorn',\n 'acoustic',\n 'acquainted',\n 'acquired',\n 'acquiring',\n 'acs',\n 'acsm',\n 'acsvxdcbgfn',\n 'act',\n 'actially',\n 'acting',\n 'actinggg',\n 'action',\n 'actions',\n 'active',\n 'activities',\n 'activity',\n 'actor',\n 'actors',\n 'actress',\n 'actresses',\n 'acts',\n 'actual',\n 'actually',\n 'acuerdo',\n 'acute',\n 'acw',\n 'ad',\n 'ada',\n 'adam',\n 'adams',\n 'adaptation',\n 'adaptec',\n 'adapted',\n 'adapter',\n 'adaptor',\n 'add',\n 'added',\n 'addict',\n 'addicted',\n 'addicting',\n 'addiction',\n 'addictive',\n 'addicts',\n 'addin',\n 'adding',\n 'addison',\n 'addit',\n 'additional',\n 'address',\n 'addressed',\n 'addresses',\n 'adds',\n 'addy',\n 'ade',\n 'adelaide',\n 'adele',\n 'adelitas',\n 'adem',\n 'adequately',\n 'adica',\n 'adidas',\n 'adiel',\n 'adios',\n 'adium',\n 'adjusting',\n 'adjustment',\n 'admeeet',\n 'admin',\n 'administrators',\n 'admiration',\n 'admire',\n 'admirer',\n 'admission',\n 'admit',\n 'admitted',\n 'admitting',\n 'admk',\n 'adn',\n 'ado',\n 'adoarble',\n 'adobe',\n 'adobo',\n 'adopt',\n 'adopted',\n 'adopting',\n 'adoption',\n 'adoptive',\n 'adorable',\n 'adore',\n 'adoreeeee',\n 'adoreiii',\n 'adoring',\n 'adoro',\n 'adrenaline',\n 'adress',\n 'adriana',\n 'adriii',\n 'ads',\n 'adsense',\n 'adt',\n 'adult',\n 'adulthood',\n 'adults',\n 'advance',\n 'advanced',\n 'advantage',\n 'advantages',\n 'adventure',\n 'adventures',\n 'advert',\n 'advertise',\n 'advertisement',\n 'advertising',\n 'adverts',\n 'advice',\n 'advil',\n 'advise',\n 'advised',\n 'adwancrd',\n 'ady',\n 'ae',\n 'aeg',\n 'aen',\n 'aeneid',\n 'aerlingus',\n 'aero',\n 'aerobars',\n 'aerobics',\n 'aeroplanes',\n 'aeropuerto',\n 'aerosmith',\n 'aesnk',\n 'aesthetic',\n 'af',\n 'afaik',\n 'afc',\n 'aff',\n 'affair',\n 'affairs',\n 'affect',\n 'affected',\n 'affecting',\n 'affects',\n 'affiliate',\n 'affiliation',\n 'affirmation',\n 'afford',\n 'afgan',\n 'afganistan',\n 'afh',\n 'afireinside',\n 'aflat',\n 'afraid',\n 'afraidiowe',\n 'africa',\n 'african',\n 'afrin',\n 'afro',\n 'afterall',\n 'afterjune',\n 'afterlife',\n 'afternoon',\n 'afternoons',\n 'afternooon',\n 'afterparty',\n 'afterpartying',\n 'afterthought',\n 'afterward',\n 'afterwork',\n 'aftie',\n 'aftrn',\n 'ag',\n 'agaaaaaaiiiin',\n 'againn',\n 'agane',\n 'agave',\n 'agcth',\n 'age',\n 'ageing',\n 'agen',\n 'agencies',\n 'agenda',\n 'agent',\n 'agents',\n 'ages',\n 'agessss',\n 'agfest',\n 'agg',\n 'aggghhhh',\n 'aggregate',\n 'aggressive',\n 'agh',\n 'aghh',\n 'aghhh',\n 'agile',\n 'agin',\n 'agitated',\n 'agnes',\n 'ago',\n 'agoraphobics',\n 'agree',\n 'agreeable',\n 'agreed',\n 'agreeing',\n 'agreement',\n 'agrees',\n 'agressiva',\n 'agressive',\n 'aguadilla',\n 'aguilera',\n 'aguirre',\n 'agus',\n 'agustin',\n 'agwl',\n 'ah',\n 'aha',\n 'ahaa',\n 'ahah',\n 'ahaha',\n 'ahahaay',\n 'ahahah',\n 'ahahaha',\n 'ahahahaha',\n 'ahahahahaha',\n 'ahahahahahaha',\n 'ahahahahahahahaha',\n 'ahar',\n 'ahasta',\n 'ahd',\n 'ahead',\n 'ahem',\n 'ahh',\n 'ahhaha',\n 'ahhahahaha',\n 'ahhh',\n 'ahhhaaaaa',\n 'ahhhh',\n 'ahhhhh',\n 'ahhhhhh',\n 'ahhhhhhh',\n 'ahhhhhhhh',\n 'ahhhhhhhhh',\n 'ahmazing',\n 'ahn',\n 'ahold',\n 'aholes',\n 'ahora',\n 'ahoy',\n 'ahseya',\n 'ahte',\n 'ahugs',\n 'ahve',\n 'ai',\n 'aiaahh',\n 'aid',\n 'aidan',\n 'aiden',\n 'aids',\n 'aight',\n 'aigm',\n 'aiken',\n 'aila',\n 'ailun',\n 'aim',\n 'aimed',\n 'aimeeeeeee',\n 'aiming',\n 'aims',\n 'ain',\n 'aint',\n 'air',\n 'airbrush',\n 'airbrushed',\n 'airco',\n 'aircon',\n 'aired',\n 'airing',\n 'airline',\n 'airlines',\n 'airplane',\n 'airport',\n 'airports',\n 'airsoft',\n 'airtel',\n 'airy',\n 'aisa',\n 'aislinntighee',\n 'aitana',\n 'aitn',\n 'aiza',\n 'aj',\n 'ajax',\n 'ajc',\n 'ak',\n 'aka',\n 'akankah',\n 'ake',\n 'akh',\n 'akl',\n 'ako',\n 'aku',\n 'al',\n 'ala',\n 'alabama',\n 'alabang',\n 'aladdin',\n 'alaiiik',\n 'alam',\n 'alan',\n 'alarm',\n 'alarms',\n 'alas',\n 'alaska',\n 'alba',\n 'albany',\n 'albeit',\n 'albert',\n 'album',\n 'albums',\n 'albuquerque',\n 'alcohol',\n 'alcoholic',\n 'ale',\n 'alec',\n 'aleesha',\n 'alegre',\n 'alejandra',\n 'alenka',\n 'alerts',\n 'alex',\n 'alexa',\n 'alexajordan',\n 'alexander',\n 'alexanders',\n 'alexandra',\n 'alexi',\n 'alexis',\n 'alexxx',\n 'alfie',\n 'alfred',\n 'algae',\n 'algebra',\n 'algonquin',\n 'alhamdulilah',\n 'alhamdulillah',\n 'ali',\n 'aliante',\n 'alias',\n 'alice',\n 'alicev',\n 'alicia',\n 'alicias',\n 'alien',\n 'aliens',\n 'align',\n 'alike',\n 'alimony',\n 'alison',\n 'alissa',\n 'alittle',\n 'alive',\n 'alives',\n 'alkaline',\n 'alkek',\n 'alki',\n 'alkie',\n 'allah',\n 'allahpundit',\n 'allegra',\n 'allen',\n 'allens',\n 'allergic',\n 'allergies',\n 'allergy',\n 'alley',\n 'allianz',\n 'allies',\n 'allison',\n 'alll',\n 'allll',\n 'alllll',\n 'alllllll',\n 'allllllllll',\n 'alllllllllllllllll',\n 'alllllllllllllllllllllllllllll',\n 'allllllllright',\n 'allmost',\n 'allmothers',\n 'allo',\n 'allow',\n 'allowance',\n 'allowd',\n 'allowed',\n 'allowing',\n 'allright',\n 'allsort',\n 'alltel',\n 'alltime',\n 'alltop',\n 'allyson',\n 'alma',\n 'almaden',\n 'almighty',\n 'almond',\n 'almonds',\n 'almos',\n 'almostt',\n 'alo',\n 'alochol',\n 'aloe',\n 'aloha',\n 'aloof',\n 'alot',\n 'alotment',\n 'aloud',\n 'alough',\n 'alpha',\n 'alphabet',\n 'alphonso',\n 'alpine',\n 'alreadi',\n 'alreadt',\n 'alreadyyyy',\n 'alreay',\n 'alredy',\n 'alrer',\n 'alright',\n 'alrightttt',\n 'alrighty',\n 'alriiightt',\n 'alrite',\n 'alryt',\n 'alt',\n 'altaf',\n 'altanta',\n 'alter',\n 'altered',\n 'alternate',\n 'alternating',\n 'alternative',\n 'alternatively',\n 'alternatives',\n 'alternator',\n 'altho',\n 'alto',\n 'alwas',\n 'alwaysss',\n 'alwayyyyyyyssssssss',\n 'aly',\n 'alyanna',\n 'alynn',\n 'alyso',\n 'alyson',\n 'alyssa',\n 'alzheimer',\n 'ama',\n 'amaaaazing',\n 'amaaazing',\n 'amadeus',\n 'amadou',\n 'amai',\n 'amaize',\n 'amanda',\n 'amandallynn',\n 'amandas',\n 'amanita',\n 'amanzimtoti',\n 'amar',\n 'amara',\n 'amarula',\n 'amason',\n 'amateur',\n 'amazake',\n 'amaze',\n 'amazecore',\n 'amazed',\n 'amazeeeeeee',\n 'amazes',\n 'amazin',\n 'amazing',\n 'amazingg',\n 'amazingggg',\n 'amazingly',\n 'amazinq',\n 'amazning',\n 'amazon',\n 'amazones',\n 'amazzzing',\n 'amber',\n 'ambers',\n 'ambien',\n 'ambience',\n 'ambiguous',\n 'ambulance',\n 'ambulances',\n 'ambyr',\n 'amcmain',\n 'ame',\n 'amen',\n 'amendment',\n 'amercia',\n 'america',\n 'american',\n 'americana',\n 'americanidolislove',\n 'americans',\n 'americas',\n 'amf',\n 'amherst',\n 'amhzz',\n 'ami',\n 'amigo',\n 'amigui',\n 'amish',\n 'amisha',\n 'amma',\n 'ammmmazing',\n 'ammo',\n 'ammoxxx',\n 'amo',\n 'amor',\n 'amorsote',\n 'amos',\n 'amost',\n 'amounts',\n 'amp',\n 'amritsar',\n 'amsterdam',\n 'amt',\n 'amtarot',\n 'amtrak',\n 'amused',\n 'amusing',\n 'amy',\n 'amything',\n 'amyyyy',\n 'ana',\n 'anais',\n 'analog',\n 'analysis',\n 'analytics',\n 'anathem',\n 'anatomy',\n 'anchorage',\n 'anchoring',\n 'anchovies',\n 'ancient',\n 'andd',\n 'anddd',\n 'anderson',\n 'andheri',\n 'andim',\n 'andre',\n 'andrea',\n 'andrew',\n 'andrews',\n 'android',\n 'andshehopes',\n 'andswere',\n 'andy',\n 'ane',\n 'anekie',\n 'anerexic',\n 'anew',\n 'ang',\n 'ange',\n 'angel',\n 'angela',\n 'angelina',\n 'angels',\n 'anger',\n 'angetan',\n 'angie',\n 'angle',\n 'angles',\n 'angrily',\n 'angrrry',\n 'angry',\n 'angsty',\n 'angus',\n 'anh',\n 'ani',\n 'animal',\n 'animals',\n 'animated',\n 'animating',\n 'animation',\n 'anime',\n 'anisalovesu',\n 'anit',\n 'anita',\n 'aniya',\n 'anke',\n 'ankile',\n 'ankit',\n 'ankle',\n 'ankles',\n 'ann',\n 'anna',\n 'annabel',\n 'annabelle',\n 'annalisa',\n 'annapolis',\n 'annas',\n 'anndd',\n 'anne',\n 'anneliese',\n 'annie',\n 'anniemay',\n 'annivarsary',\n 'anniversary',\n 'annnd',\n 'annnnd',\n 'annnnnnddd',\n 'annonymity',\n 'announce',\n 'announced',\n 'announcement',\n 'announcements',\n 'announces',\n 'announcing',\n 'annoy',\n 'annoyed',\n 'annoying',\n 'annoyingly',\n 'annpyimg',\n 'annual',\n 'ano',\n 'anoher',\n 'anoop',\n 'anooyed',\n 'anotha',\n 'anothe',\n 'anouther',\n 'anr',\n 'ans',\n 'anshul',\n 'anstee',\n 'answear',\n 'answer',\n 'answerd',\n 'answered',\n 'answerer',\n 'answering',\n 'answerr',\n 'answers',\n 'ant',\n 'anthem',\n 'anthems',\n 'anthony',\n 'anthropomorphic',\n 'anti',\n 'antibiotics',\n 'antiboyle',\n 'anticipate',\n 'anticipating',\n 'anticipation',\n 'antics',\n 'antidisestablishmentarianism',\n 'antioch',\n 'antm',\n 'anto',\n 'antoinette',\n 'antomy',\n 'antonio',\n 'antony',\n 'ants',\n 'anurag',\n 'anwar',\n 'anway',\n 'anwb',\n 'anwhere',\n 'anxiety',\n 'anxious',\n 'anxiously',\n 'anyarticle',\n 'anybody',\n 'anychance',\n 'anyday',\n 'anyhoo',\n 'anyhooo',\n 'anymore',\n 'anymoree',\n 'anyones',\n 'anythgin',\n 'anythig',\n 'anythin',\n 'anytime',\n 'anyways',\n 'anywayss',\n 'anywayz',\n 'anywho',\n 'anyy',\n 'ao',\n 'aobut',\n 'aoki',\n 'aol',\n 'aot',\n 'aots',\n 'ap',\n 'apa',\n 'apaently',\n 'aparantly',\n 'aparently',\n 'apart',\n 'apartment',\n 'apartments',\n 'apathetic',\n 'apathy',\n 'ape',\n 'apearance',\n 'apetite',\n 'aphrodisiac',\n 'api',\n 'apicture',\n 'apl',\n 'aplikace',\n 'aplusk',\n 'aplyin',\n 'apm',\n 'apnea',\n 'apollo',\n 'apologetic',\n 'apologies',\n 'apologise',\n 'apologised',\n 'apologize',\n 'apology',\n 'aporkalypse',\n 'app',\n 'apparantly',\n 'apparent',\n 'apparently',\n 'appartement',\n 'appartment',\n 'appeal',\n 'appealing',\n 'appeals',\n 'appear',\n 'appearance',\n 'appearances',\n 'appeared',\n 'appearing',\n 'appears',\n 'appending',\n 'apperently',\n 'appericiate',\n 'appetite',\n 'appetizing',\n 'applaud',\n 'apple',\n 'applebees',\n 'applebottoms',\n 'applejacks',\n 'apples',\n 'applescript',\n 'appliances',\n 'application',\n 'applications',\n 'applied',\n 'applies',\n 'apply',\n 'applying',\n 'appointment',\n 'appointments',\n 'appology',\n 'appraising',\n 'appreci',\n 'appreciate',\n 'appreciated',\n 'appreciating',\n 'appreciation',\n 'apprentice',\n 'approach',\n 'approaching',\n 'appropriate',\n 'appropriately',\n 'approve',\n 'approves',\n 'approving',\n 'approx',\n 'approximately',\n 'apps',\n 'appstore',\n 'appt',\n 'appts',\n 'april',\n 'apt',\n 'aptism',\n 'aptitude',\n 'aptw',\n 'apuya',\n 'apy',\n 'aquarius',\n 'aquatards',\n 'aquatic',\n 'aquats',\n 'aquestion',\n 'aquino',\n 'ar',\n 'ara',\n 'arabelle',\n 'arabic',\n 'arabs',\n 'arabyrd',\n 'aracheologist',\n 'arbit',\n 'arbiter',\n 'arbor',\n 'arc',\n 'arcade',\n 'arcadia',\n 'arch',\n 'arches',\n 'archetype',\n 'archetypes',\n 'archie',\n 'archies',\n 'architect',\n 'architecture',\n 'archive',\n ...]"},"metadata":{}}]},{"cell_type":"markdown","source":"This definitely looks a lot better but there are still some mispelled or confused words and plurals might not need to be counted as separate words. These will be handled in preprocessing","metadata":{"tags":[],"cell_id":"d321ffb6-f776-49f3-8301-ec4f442cd5b9"}}],"nbformat":4,"nbformat_minor":2,"metadata":{"orig_nbformat":2,"deepnote_execution_queue":[],"deepnote_notebook_id":"5b7bd68f-8782-4d27-9c43-896e81da6f59"}}
